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Abstract 

Moser & Tardos have developed a powerful algorithmic approach (henceforth “MT”) to the 
Lovasz Local Lemma (LLL); the basic operation done in MT and its variants is a search for 
“bad” events in a current configuration. In the initial stage of MT, the variables are set indepen¬ 
dently. We examine the distributions on these variables which arise during intermediate stages 
of MT. We show that these configurations have a more or less “random” form, building further 
on the “MT-distribution” concept of Haeupler et al. in understanding the (intermediate and) 
output distribution of MT. This has a variety of algorithmic applications; the most important is 
that bad events can be found relatively quickly, improving upon MT across the complexity spec¬ 
trum: it makes some polynomial-time algorithms sub-linear (e.g., for Latin transversals, which 
are of basic combinatorial interest), gives lower-degree polynomial run-times in some settings, 
transforms certain super-polynomial-time algorithms into polynomial-time ones, and leads to 
Las Vegas algorithms for some coloring problems for which only Monte Carlo algorithms were 
known. 

We show that in certain conditions when the LLL condition is violated, a variant of the MT 
algorithm can still produce a distribution which avoids most of the bad events. We show in 
some cases this MT variant can run faster than the original MT algorithm itself, and develop 
the first-known criterion for the case of the asymmetric LLL. This can be used to find partial 
Latin transversals improving upon earlier bounds of Stein (1975) - among other applications. 

We furthermore give applications in enumeration, showing that most applications (where we 
aim for all or most of the bad events to be avoided) have large solution sets. We do this by 
showing that the MT-distribution has large Renyi entropy. 

Key words and phrases: Lovasz Local Lemma, Moser-Tardos algorithm, LLL-distribution, MT- 
distribution, graph coloring, satisfiability, Latin transversals, combinatorial enumeration. 

1 Introduction 

We consider a number of basic applications of the Lovasz Local Lemma (LLL) in probabilistic 
combinatorics and graph theory |5J: these include Latin transversals, hypergraph 2-coloring, various 
types of graph coloring, k-SAT, versions of these problems where we satisfy “most” of the constraints 
(as in MAX-SAT), and enumerating (lower-bounding) the number of solutions to these problems. 
Recall that the LLL gives a powerful sufficient condition for avoiding all of a given set of bad 

*A preliminary version of this paper has appeared in the Proc. ACM-SIAM Symposium on Discrete Algorithms, 
2016. 

* Department of Computer Science, University of Maryland, College Park, MD 20742. Research supported in part 
by NSF Awards CNS-1010789 and CCF-1422569. Email: davidgharris29@gmail.com 

* Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland, College 
Park, MD 20742. Research supported in part by NSF Awards CNS-1010789 and CCF-1422569, and by a research 
award from Adobe, Inc. Email: srin@cs.umd.edu 


1 



events. We study the seminal Moser-Tardos approach (henceforth “MT”) for algorithmic versions 
of the LLL [32], presenting new analyses and branching processes to speed up the MT algorithm - 
significantly in some cases (e.g., from exponential to polynomial, and from polynomial to sublinear); 
furthermore, we improve upon the known sufficient conditions for only a “few” of the given bad 
events to occur. A fundamental idea behind our work is that the structures arising in the execution 
of MT are “random-like”, and that such average-case behavior can be used to good advantage. 

We refer to the distribution on the variables at the termination of the MT algorithm as the 
MT- distribution. A key randomness property of this distribution has been demonstrated in m- 
We develop this further, showing that the intermediate structures arising in the execution of MT 
have some very useful “random-like” properties, which can be exploited using additional ideas. 

In the MT setting, we have a set of variables X\,, X n . We have also a product probability 
distribution II, which selects a integer value j for each variable X{ with probability Pi.j] the variables 
are drawn independently and YljPi,j = 1 f° r each *■ We have events , which are Boolean functions 
of subsets of the variables. We say that E ~ E' iff E, E' overlap in some variable(s), i.e., if each 
of them involves some common X{. (Note that we always have E ~ E.) There is a set of m bad 
events B which we are trying to avoid. In this setting, the MT algorithm is as follows: 

1. Draw Ah,..., X n from IL 

2. Repeat while there is some true bad event: 

2a. Choose a currently-true bad event B € B arbitrarily. 

2b. Resample all the variables involved in B from the restriction of D to just these variables. 

(We refer to this step as resampling the bad event B). 

For any event E (whether in B or not), we let N(E) denote the inclusive neighborhood of E, 
viz. the set of all bad events B £ B such that B ~ E. This is “inclusive” since E E N(E) for 
E £ B. 

When we are analyzing the MT algorithm, we let T denote the termination time (T = oo if 
the algorithm runs forever). For t = 0, ... ,T we let X t denote the configurations of the variables 
(the values of X \,..., X n ) after t resamplings; A 0 is the initial configuration (after step (1)). For 
t = 1,..., T — 1, we let B t denote the bad-event which is resampled at time t. 

In our analyses, there are two probability distributions at play. First, there is the distribution 
0, to which the LLL applies and which the MT algorithm is (in a certain sense) trying to simulate. 
Second, there is the probability distribution which describes the execution of the MT algorithm; 
this second probability distribution is the one that is “actually occurring.” In order to ensure that 
this second probability distribution is well-defined, we assume that there is some fixed rule (possibly 
randomized) for choosing which bad-event to resample. We refer to probabilities of the first type as 
Pq and probabilities of the second type (which are the true probabilities of the events of interest) 
as simply P. 

The key criterion for the convergence of the MT algorithm is the “asymmetric LLL” [371 . We 
state a slightly stronger form of this criterion due to Pegden [34j : 

Theorem 1.1. Suppose there is p : B [0, oo) such that for all B € B we have 

p(B)>P n (B)x J2 n MB'). (!) 

IC.N(B) B'ei 

I independent set under ~ 
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Then the MT algorithm terminates with probability 1; the expected number of resamplings of any 
bad event B € B is at most p{B). 0 

The “Symmetric LLL” is a special case of this, obtained by setting p(B) = e • Pq(B): 0 

Theorem 1.2. Suppose Pn{B) < p and |1V(.B)| < d for all B € B, with epd < 1. Then the MT 
algorithm terminates with probability 1, and the expected number of resamplings of any bad event 
is at most ep. 

The MT algorithm can give polynomial-time algorithms for nearly all applications of the Lovasz 
Local Lemma. Yet, implemented directly, this algorithm can be fairly slow. The key bottleneck 
is that, in each step of the algorithm, one must search for currently-true bad events (or certify 
there are none). We show, by understanding the MT-distribution and some of its relatives better, 
that the configurations which arise during the execution of MT have a more or less “random” 
form, and that currently-true bad events can be found relatively quickly in expectation. Our main 
contributions are as follows. 

(a) From super-polynomial to polynomial time, and from Monte Carlo to Las Vegas. 

The MT algorithm, as described, may not run in poly(n) time if the number of bad events is super¬ 
polynomial. This issue is addressed in m, where polynomial-time algorithms are developed for 
many such cases. However, the framework of m has some important limitations. First, it typically 
requires satisfying the LLL criterion with an additional slack. This means that one typically obtains 
worse constructive bounds than the existential ones possible from the LLL. Second, this framework 
leads to Monte Carlo algorithms — that is, the algorithm terminates and there is a high probability 
(but not certainty) of success. These problems are both present for the class of problems based on 
non-repetitive vertex colorings. In Section [5J we present improved algorithms for these problems; 
our algorithm leads to essentially the same parameters as the non-constructive LLL, and is Las 
Vegas. 

(b) Improved polynomial run-times. We also significantly improve the run-times of certain 
combinatorial algorithms. In Section 13.21 we give improved algorithms for Ramsey number lower 
bounds. In Section I3l4l we give improved algorithms for hypergraph 2-coloring, reducing a quadratic 
run-time to a quasi-linear run-time. In Section El we give the first sub-linear algorithm for Latin 
transversals: one that runs in time proportional to the square root of the input length. Latin 
transversals and their “partial transversal” variants are well-studied in combinatorics (see, e.g., 
[g Ena m mmE eed, the latter of which we encounter in item (c) next. 

(c) Partially avoiding bad events. In some cases, the LLL criterion is not satisfied, and one 
cannot necessarily avoid all the bad events. However, one can still avoid most of the bad events. 
This issue was first examined in m , which extended the symmetric LLL to the case when epd = a, 
for a € [1, e], and d was large: they gave a randomized algorithm whose expected number of bad 
events at the end is (1 + o(l)) • mp ■ (eln(a)/a) = (1 + o(l)) • — jjr a , where the “o(l)” term is a 
function of d that tends to zero for large d. No such results were known for the general asymmetric 
LLL (Theorem ll.il) or symmetric LLL for small d. We develop the first “few bad events” variant of 
Theorem o in Theorem EZQ and also obtain an exact result for the symmetric LLL by removing 
the “o(l)” term above (Corollary I6.2[i . 

1 Clearly, y{B) > Pci(B) x Xucjv(.b) El B'ei mC® 7 ) = -Pn(-B) x n_B'e.iV(.B)(l + m(-B')) is a sufficient condition for (Q. 
Setting x{B) = y(B)/(y(B) + 1) in this sufficient condition recovers the usual formulation of the asymmetric LLL. 

2 In other formulations of the symmetric LLL, N(B) is defined to be the exclusive neighborhood (not counting B 
itself), and hence the criterion becomes ep{d + 1) < 1. The reader should bear in mind that in this paper, N(B) 
non-standardly refers to the inclusive neighborhood. 
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These results apply to many forms of the Lopsided Lovasz Local Lemma (LLLL) (an extension 
of the LLL to probability spaces in which the bad-events are “negatively correlated” in a certain 
technical sense; see [IS]). Some well-known applications of the LLLL which we treat here include 
random permutations and fc-SAT. Our algorithms here are also much faster than [19]. Some 
applications of this technique are also given to partial Latin transversals, improving upon [38]. 

(d) Entropy of the MT-distribution and combinatorial enumeration. We show another 
concrete way in which the MT-distribution has significant randomness - that its Renyi entropy 
[12] is relatively close to that of the initial product distribution. (The min-entropy is a special 
case of the Renyi entropy and has become a central notion in randomness extractors and explicit 
constructions: see, e.g., [nmoi [mi no].) For many applications of the LLL, such as /c-SAT, non- 
repetitive coloring etc., this implies that the solution set has greater cardinality than was known 
before; perhaps more excitingly, it further builds on item (c) above to prove for the first time that 
MAX-SAT instances, as just one example, have several good solutions. 

To summarize, we consider some basic applications of the LLL, and develop (much) faster 
algorithms for these, some of which are the first-known polynomial-time- or Las-Vegas- algorithms. 
We also present improved/new algorithms and enumerative results in settings where we can allow 
a few bad events to happen. The impetus behind our work is further investigation of the MT- 
distribution and some of its relatives. 

1.1 Technical overview 

The original analysis of Moser & Tardos gave sufficient conditions for their MT algorithm to termi¬ 
nate, yielding a configuration without bad-events. However, often one would like more information 
about such configurations, beyond the bare fact that they exist. As shown in [T9] , one can define 
an MT-distribution: the probability distribution induced on configurations that are output from 
the MT algorithm. The MT-distribution was used by m to show that in various MT applications, 
one can guarantee that the output of the MT algorithm has additional good properties. 

Another useful application of this principle comes from [23], which uses the MT disribution to 
find configurations (e.g. independent transversals) which have certain large-scale average properties 
as well. For example, one may define a weighting function on elements and find configurations with 
high overall weight, by examining the expected weight in the MT-distribution. 

In this paper, we take the notion of the MT-distribution much further: not only can one 
analyze the probability distribution on the output of the MT algorithm, but one can also analyze 
the distribution on its intermediate states. These intermediate distributions share many properties 
with the original sampling distribution 9, which is just a product distribution. In particular, the 
key step of the MT algorithm — the search for currently-true bad events — is quite similar to a 
search problem over a random configuration. Random configurations are often easy to search: for 
example, while deciding L-colorability is NP-hard in general, a simple algorithm of [28] solves it for 
Erdos-Renyi random graphs in expected polynomial time. 

The key step of the MT algorithm thus often boils down to finding a bad-event in a (nearly) 
random configuration. This can often be accomplished by branching algorithms , in which one 
gradually builds up a putative true bad event by “guessing” successively more of its state. At every 
step, one can check whether the partial bad event is extendable to a full bad event, and abort 
the search if not. Using the randomness of the configuration, one can show that there is a good 
probability of aborting early. 
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1.2 Outline 


In Section [21 we review the analysis of the MT algorithm. We describe witness trees, a key proof- 
technique for showing the convergence of that algorithm, which also plays a key role in understand 
the MT distribution. We also introduce a new variant of the critical Witness Tree Lemma, which 
allows us to bound the probability of events in internal states of the MT algorithm. 

Sections [3] describes our basic algorithms and data structures. Two applications are given, 
for Ramsey numbers and for hypergraph 2-coloring. They are good representatives of “typical” 
applications in combinatorics and algorithms, and they show how these techniques can lead to faster 
algorithms for many LLL applications, even those which already have polynomial-time algorithms. 

Section [4] analyzes a variant of the MT algorithm for random permutations, and shows that 
one can obtain the first sub-linear (square-root of input size) algorithms for Latin transversals, a 
problem of fundamental combinatorial interest. 

Section [5] addresses non-repetitive vertex coloring - one of the few remaining cases where 
polynomial-time versions of the LLL were not known - and develops such polynomial-time ver¬ 
sions. 

Section [6] addresses the problem of partially avoiding bad events, in cases where the LLL criterion 
is not satisfied. We tighten the bounds of [19], giving a symmetric criterion in the case when 
epd = a, for a £ [1, e], as well as, for the first time, an asymmetric criterion. Furthermore, we 
give a faster parallel algorithm in this case; while applying the parallel MT algorithm directly, as 
in m, would give a running time of 0( we improve this to O(^yy0- 

Section |7] estimates the entropy of the MT-distribution, and shows that it is close to the original 
distribution. This automatically implies that there are many more solutions than known before 
for various problems such as L-SAT, non-repetitive coloring, and independent transversals - and 
especially the maximum-satisfiability variants of these problems. 

2 Witness trees and the MT-distribution 

The analysis of [32] is based on witness trees, an analytical tool which provides the history of 
all variables that lead up to a resampling. These give an explanation or witness for each of the 
resamplings that occurs during the MT algorithm. As shown in [19] . these witness trees can also 
be used to give explanations for other types of events (not necessarily bad events). We will give a 
very brief overview of these results here; the reader should consult [32] and m for a much more 
in-depth explanation of these concepts. 

Suppose we run the MT algorithm, and we resample the bad-events B 1 ,... , B T in order; the 
MT algorithm may or may not have terminated by this point. We may produce a witness tree f k 
for the k th resampling, as follows. We begin by placing a singleton root node labeled B k . We then 
proceed backward for t = k — l,k — 2,..., 1; for each bad-event B t , we see if there are any nodes 
of r k which are labeled by some B' ~ B*. If there are not, then we do not modify f k . If there, we 
select one such node at greatest depth in r k , and attach to it a new leaf node labeled B l . 

In this description, f k is a random variable. One may also fix a specific labeled tree r, and 
examine if f k = t for any value of A:. If there is some value of k for which f k = r, we say that 
r appears. To distinguish these related notions, we use the term “tree-structure” to refer to a 
particular labeled tree which could be produced as a value for the (random variable) r*. 

The key lemma in [32], which governs the behavior of the MT algorithm, is the Witness Tree 
Lemma: 

Definition 2.1 (Weight of a witness tree). For any tree-structure t, whose nodes are labeled by 
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events B \,..., B s , we define the weight of r by w(t) = 

Lemma 2.2 (Witness Tree Lemma). For any tree-structure r . P(r appears ) < w(r). 

One key result of [32] is the following: 

Proposition 2.3 ([32]). Let B be any bad event. The total weight of all tree-structures rooted in 
B is at most p>{B). 

In [19] , Lemma 12.21 and Proposition 12.31 and were extended to arbitrary events. Given some 
event E which occurs during the MT algorithm, one can build a “witness tree” for it. The tree 
has a root node, labeled by E\ one constructs the remainder of the tree in the same manner as 
we have previously described, going backward in time and inserting nodes labeled by bad-events. 
These trees have a slightly different form to those analyzed by Moser & Tardos; their root node is 
labeled by E , and all the other nodes are labeled by bad-events. 

Given a tree-structure r rooted in E , we say that r appears if f k = r, where k is some time 
at which E is true during the MT algorithm. The weight of such a tree, whose nodes are labeled 
by events E\,... ,E^ (which are not all necessarily bad-events), is nf=i Th(-£))• The Witness Tree 
Lemma applies here as well: 


Proposition 2.4 ([T9]). Let t be a tree-structure rooted in E. The probability that r appears is at 
most w(t). 


In order to state the result of [19] . it will be convenient to have the following notation: for any 
event E, we define 

6{E) = P n (E) Y H«B) (2) 

XCN(E) Bel 
1 independent 

Note that 

6(E) < P n (E) I] (1 + »(B)) < P n (E) exp( Y /z(fi)) 

B~E B~E 


for any event E, where exp(t) denotes e t . Also, note that in the symmetric LLL setting, we have 
0(E) Pfi(E) exp(e • p ■ |iV(_ZT) |). The asymmetric LLL criterion can be summarized compactly as 
fi(B) > 6(B) for all B. 


Proposition 2.5 ([32]). Let E be any event. The total weight of all tree-structures with a root 
node E, and the remaining nodes consisting of bad-events, is at most 0(E). Hence, the probability 
that event E occurs in the output of the MT-distribution is at most #(-E)lj 


2.1 A witness tree lemma for internal states 

We now introduce a key lemma which allows us to bound the probability of events occuring in 
internal states of the MT algorithm. One crucial feature of this lemma is that we can not only 
compute the probability that E occurs, but we can count the number of times it occurs. 

Lemma 2.6. Let E be any event, and let B £ B. Then 

T 

Y P{E(X t ) A B t = B)< pl{B)0{E). 

t =l 

(To clarify the notation, E(X t ) means that event E is true in the configuration X 1 .) 

3 We note that in T9] a slightly weaker result was proved; this Proposition 12.51 follows easily by combining Pegden’s 
analysis [M] and Bissacot et al.’s cluster-expansion criterion [8] with the ideas of m 
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Proof. For each time t satisfying E(X f ) and = B, one may construct a type of witness tree 
which we denote f l . This is constructed in a similar manner to that of [19j. We place a node labeled 
by E at the root and place a child node labeled by B below it. (Note that we do not necessarily 
have E ~ B, and so the B would not necessarily have been placed as a child of E in the standard 
method for generating witness trees.) We then go backward in time through the execution log of 
the MT, placing any resampled bad events in the tree (as children of E or B or lower nodes). 

We refer to the set of possible witness trees that can be produced in this fashion as E/B- tree- 
structures. 

We note that all the witness trees that are produced in this fashion are distinct; for, in the k th 
resampling of B, the witness tree f 1 ' has k nodes which have label B. This implies that 

T 

^[£(X*)AH* = H]< ['t appears] 

t= 1 _E/.B-tree-structures r 

where [E(X t ) A B t = B\ is (here and throughout the paper) the Iverson notation, which is one if 
E^X 1 ) A B t = B is true and zero otherwise. 

Next, one may show that the witness tree lemma holds for Fi/H-tree-structures. Namely, for 
each fixed tree-structure r, we have P(t appears) < w(t). (The proof of this is nearly identical to 
Proposition 12.51 ) Hence we have 

T 

^P(^)AB* = B)< w(t) 

t= 1 E/B- tree-structures r 

So let us consider the total weight of all such P/H-tree-structures. We define a mapping / 
from pairs of tree-structures ti,T 2 rooted in E, B respectively to an E/B- tree r = /(ti,T 2 ). This 
mapping is defined by adding T 2 as a child of the root node of t\ . 

This mapping is surjective — given an P/H-tree-structure r, which has a root node E and a 
child node v labeled B , let T 2 be the subtree rooted at v and the let t\ = r — T 2 ; then /(n, 72 ) = r. 
Furthermore, this mapping has the property that w(f(Ti,T 2 )) = w{t\)w(t 2 ). Thus, we have that 

Y w{t)< Y Y w (f(n,T 2 )) 

E/B- tree-structures r tree-structures n tree-structures r 2 

rooted at E rooted at B 

= Y Y w{t\)w{t 2 ) 

tree-structures t\ tree-structures r 2 
rooted at E rooted at B 

By Proposition ESI we have Etree-stmctures T rooted at e W ( T ) ^ 0 ( E )• % Proposition El we 
have Etree-structures r rooted at b W ( T ) ^ v( B )- Hence the total weight of all P/P-tree-structures is 
at most n(B)8(E). 

□ 


3 Fast search for bad events 

To implement the MT algorithm, we must search for any bad-events which are currently true (or 
certify there are none). The simplest way to do this would be to check the entire set B in each 
iteration. This will cost £l(m) time per iteration (at least). If the bad-events are provided to us an 
arbitrary list, this is optimal. However, most applications of the LLL have more bad events than 
variables, and these bad events are much more structured. 
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Consider the very first iteration of the MT algorithm, searching for currently-true bad-events. 
In this case, the variables X are distributed according to fi, a product distribution. For many 
problems, one can search random configuration faster (in expectation) than arbitrary configurations. 
Thus, one should be able to perform the first search step much faster than Q(m) time. As the MT 
algorithm proceeds, the distribution becomes distorted. However, we prove that it does not stray 
too far from its original distribution. Thus, one can still hope to find bad-events significantly faster 
on these intermediate distributions than on arbitrary distributions. 

For most applications of the MT algorithm, including all those in this paper, the remaining 
steps of the MT algorithm can be done relatively efficiently. For example, resampling each variable 
typically takes 0(1) time. As the work of resampling variables will always be negligible compared 
to finding true bad-events, we will ignore this cost throughout. 

3.1 Efficient search algorithms 

One main ingredient of our algorithms is a problem-specific search algorithm S which given an 
assignment X of the variables, determines all the bad-events currently true on X. This search 
procedure may be randomized, consuming a random source R (which is independent of the random 
source used to drive the MT algorithm itself). We refer to this as S(X, R). 

In many settings, finding a search algorithm which gives good worst-case bounds can be difficult 
or impossible. However, we will seek to parametrize the run-time of S so that we can analyze its 
behavior on distributions drawn from the intermediate stages of MT. We thus define an event- 
decomposition for S to be a set of events A* (not necessarily bad events) and constant terms q, 
where i ranges over the integers, with the property that 

E*[Tim e(S(X,R))} < J^CiiAfiX)]. (3) 

i 

It is important to note in this definition that the expectation is taken only over the random source 
R consumed by S, not on the randomness of the MT process itself. 

We can now measure the running time of MT as follows: 

Theorem 3.1. Given an event-decomposition for S as in m, define T = ^Tcj0(Aj). Then, 
E [run-time of MT] < (1 + h(B))T. 

Proof. We sum over the times t = 0,..., t — 1 so that 

T T 

E#[time] < ^E i? [Time(5(A' t ,i?))] < a P{A i {X t )) 

t =0 t =0 i 

We first consider time t = 0. The configuration X° has exactly the distribution H, hence 
P(MX 0 )) = Pn(AfiX)) < 9{Ai). 

Next, for each time t = 1,..., T we have that 

T T 

^P(A,(X*)) = Y^Yl P ^(X t )AB t = B). 

t= 1 B&B t =1 


By Lemma ESI this is YIbgB p(B)0(Ai). The result follows. 
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3.2 Example: Faster algorithms to construct Ramsey graphs 

A classical result in combinatorics is the lower bound on the diagonal Ramsey number R(k, k ) > 
Vlk2 k l 2 via the LLL |5|. This can be viewed also as an algorithmic challenge: given k, two-color 

the edges of the complete graph K n for n = \^-k2 k / 2 ~\ , such that no fc-clique has all (^) edges of 
the same color. 

Proposition 3.2 (Follows straightforwardly from MT). For n = \—k2 k / 2 ~\, there is an algorithm 
to construct a two-coloring of K n avoiding monochromatic k-cliques, in expected 2 fc2 / 2 +°( fc N time. 

Proof. For each /c-clique, there is a bad-event that it is monochromatic; this has probability p = 
2 M 2 ). There are m = (?) < n k /k\ cliques, and so the expected number of resamplings is at most 
mep. For each resampling, we check each fc-clique, which takes (^m time. Thus, the total expected 
time in 0(epfym 2 ) < 2 fc “/ 2 +°( fc “). □ 

Although there are exponentially many bad-events in this case, they have a combinatorial 
structure and it is not necessary to search each bad-event individually. Rather, we can use a type 
of branching algorithm to enumerate the cliques. This search algorithm was developed in [2Tj in 
the context of a similar application of the LLL; however, in that case, it was only necessary to 
analyze the initial configuration. 

Proposition 3.3. There is a deterministic search algorithm S for monochromatic k-cliques with 
an event decomposition 

Time(S(X)) = n° (1) E [I monochromatic on X ] 

cliques I 
\i\<k 

Proof. We recursively enumerate all Tcliques, for i = 2,..., k. Initially, every edge is a monochro¬ 
matic 2-clique. Next, for each monochromatic i-clique I, we test all possible vertices v and check 
if I U {u} is also monochromatic. It takes Q) time to check each i-clique, so the total time for this 
process (extending a given i — 1 clique to ^-cliques) is at most 0(n(*)) < n°^\ □ 

Proposition 3.4. For n = \^-k2 k / 2 ~\, there is an algorithm to construct a two-coloring of K n 
avoiding monochromatic k-cliques, in expected 2 fc “/ 8 +°( fc “) time. 

Proof. We apply Theorem 13.11 to the event-decomposition of Proposition 13.31 We have: 

T = n °^ 0(1 monochromatic on X) 

cliques I 

u\<k 

k 

< n 0 ^ ^ ^ 2 1_ (a) exp(ep\N(I)\) 

i =2 z-cliques I 
k 

< n°^ ^2 n l 2~^) exp (epi 2 n k ~ 2 / ( k — 2)!) 

»=2 

^ 2 fc 2 /8+o(fc 2 ) 

Now, YIb h(B) < mep = 2°( k \ Hence by Proposition 13.II the overall run-time of MT is 2 fc2 / 8+ °! fc2 ). 

□ 
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This is a polynomial improvement over Proposition 13.21 roughly reducing the time to the fourth 
root. 

Many of our algorithms to search for bad-events have the same flavor as the search for Ramsey 
graphs: we want to find some structured bad-event, which involves many variables. Instead of 
seeking to enumerate over the entire set of variables at once, we build up the variables gradually. 
This leads to a type of branching process. At level i of the process, we have “guessed” a set of i 
variable indices; we then check whether it is possible that there is a bad-event involving them. If 
we can rule this out, we abort the branching process; otherwise we extend it by trying to add a new 
variable. We refer to each partial list of variables, which is putatively involved in a true bad-event, 
as a story. For example, in the case of Ramsey graphs, a story is an z-clique for i < k. 

3.3 Depth-first-search Moser-Tardos 

As we have seen, the main cost in the MT algorithm is to search for any bad-events which are 
currently true (or certify there are none). The simple way to do this, as we have discussed in 
Section [3l is to check the entire set B in each iteration. This is rather wasteful; an optimization 
suggested by Joel Spencer, is to maintain a stack which records all the currently-true bad-events. 
At the very beginning of the MT algorithm, we scan the entire set B to find all the true bad-events. 
Whenever we resample a bad-event B, we only need to check its neighbors to determine whether 
they became true (and if so, we add them to the stack); we do not need to search the entire space. 

For example, in the symmetric LLL setting, we must expend 0(d ) work after each each resam¬ 
pling (assuming that we have an adjacency list for the dependency graph and it requires unit time 
to check a bad-event). As the expected number of resamplings overall is 0(m/d), this gives a total 
expected running time 0(m). If the bad-events are simply provided to us as an arbitrary list, this 
is already optimal. 

We refer to this as a “depth-first-search” MT. This can potentially improve the runtime of MT 
by up to a factor of n; because instead of needing to re-scan all the bad-events, we only need to 
scan those affected by the most-recently-resampled variables. 

For applications with structured bad-events, we can speed up the depth-first search strategy by 
taking advantage of the random nature of the MT-distribution. We can hope to design a search 
algorithm which takes as input a configuration of variables, and a bad-event B, and lists all of the 
bad events B' ~ B which hold in it. 

A key ingredient: data structure D. One main ingredient of our algorithms is a problem- 
specific data-structure D which, given a bad event B and a configuration X, can determine all 
the bad events B' ~ B which may be caused to be true by resampling X. This data-structure 
also requires an initialization step, in which given a variable-assignment X we find all bad events 
currently true in it, as well as recording any other information about X needed to use the data 
structure later. (Initialization is typically much cheaper and simpler than the updating step, and 
is only performed once, so we mostly ignore it in our analyses.) 

In addition, we may want to use a randomized data-structure; we allow D to uses a random 
bit-string R (which is independent of the randomness used to drive the MT algorithm itself). This 
leads to the following formulation: 

Theorem 3.5. Suppose that we are given an event-decomposition \ B £ B} and a 

randomized data-structure D which satisfies the following condition: 

Suppose that, given a bad-event B and configuration X, the data-structure D(B,X ) finds all 
the bad-events which are true on X and are dependent with B. Furthermore, for any fixed B,X 
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suppose we have 


E_r 


Time(D{B,X)) 


< y ^c B ,i[ A B,i(x)\ 

i 


For each event B, define Tb = CB,iO(AB,i)- 

Then, the expected run-time of the MT algorithm, exclusive of time required for the initialization 
steps, is at most YIb^b p( p ) p B- 


Proof. We sum over time t = 1,... T: 
T 

E U Time(Z?(5*,X t )) 
t=1 


t= 1 i 
T 

E [E E E CB \- A B,i (**) A B t = B] 

t= 1 BeB i 


T 

= E E E P ( A B,i( Xt ) A B t = B) 

BeB i t =1 

^ E^)E CB,iO{A B ,i ) by Lemma [276] 

BeB i 


< E 

BeB 


□ 


3.4 Example: hypergraph two-coloring 

We consider a more technically involved example. Suppose we are given a ^-uniform hypergraph 
with m hyper-edges, and we wish to find a two-coloring of the vertices so that no edge is monochro¬ 
matic. For each edge /, let N(f) denote the edges which intersect with / (including / itself). If 

|IV(/)| < L < 0.17^/j^2 fc for all edges /, then MT can be applied to the approach of [35] to find 
a good coloring. The analysis of :35j introduces a separate bad-event for each intersecting pair 
of edges; thus, straightforward analysis would indicate a running time mL ■ poly(/c); potentially, 
a quadratic-time algorithm. (Another variant of that algorithm, given in [IT], would lead to an 
analogous result.) We reduce this to mlog 0 ^ rn time. 

Set-up for the LLL. We begin by describing a version of the algorithm of [35] to find such 
a coloring via the LLL. First, each vertex chooses a color at random. Next, we choose a random 
ordering of the vertices (equivalently, each vertex independently chooses a random rank p v € [0,1]). 
For each vertex v in this order, we look for any monochromatic edges of which v is the lowest-ranking 
vertex. If we find any such edge, we flip the color of v. 

It is easy to implement this procedure in time 0(m ), but the probability that it succeeds can 

be very low when m^> L. We we will assume that m > {^tE); otherwise, as shown in [ 35] , 

then this algorithm produces a good coloring with probability fl(l). 

This procedure fails to produce a valid coloring only if the following occurs. There is some edge 
/, originally colored blue (w.l.o.g.), and vertex v € / is the lowest-ranking vertex of /. There is 
another edge f, which intersects / in exactly v, with the property that all other vertices in f are 
either red or have rank lower than v. In that case, it is possible that all the originally blue vertices 
in f are flipped, becoming red. This type of edge will remain monochromatic in the final coloring. 
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Each vertex has two variables associated with it: its (original) color and its rank p v . We use 
the MT algorithm to select both values. 

We will translate this into the LLL framework in a somewhat unusual way. We define a bad 
event B hlue (f, f) to mean that the above event occurred and the minimum-ranking vertex in f 
had rank < R, where R = We define a bad event B hlue (f ) to mean that edge / was originally 
blue and all vertices in it had rank > R. We similarly define B ied (f) and H red (/, /'). Note that 
the algorithm fails iff at least one of the four types of bad events occurs. The reason we are 
distinguishing the two cases of the minimum-ranking vertex in /, is that when this rank is large, 
then fixing / will typically break many so it is not beneficial to take a union-bound over all 
such /'. 

We now use the asymmetric LLL. For an event B(f), we assign p(B(f)) = \fep\ and for an 
event B(f,f) we assign p(B(f,f)) = ep 2 , where p\ = P n {B(f)),p 2 = Pn{B(f,f)). 

Let us first compute p\ . For an event B hlue (f), it must occur that all the vertices in / are blue 
and have rank > i?; this occurs with probability p\ = 2~ fc (l — R) k . 

Next, let us compute p 2 . Suppose /, f intersect in v. For an event B hlue (f, /'), it must occur 
that all vertices in / are blue; this occurs with probability 2~ k . All the vertices in /, other than 
v, must have rank exceeding that of v, this occurs with probability (1 — p v ) k ~ l . All the vertices 
in /', other than v, must be either red or have rank less than v\ this occurs with probability 
(1/2 + l/2p v ) k ~ 1 . Hence, integrating over p v € [0, R], we have 

P2 < [ R dp v 2~ k (l - p v f-\ 1/2 + 1/2 p v ) k ~ l 

pv — 0 

= 2 1-2fc [ R dp v (1 - p v ) k ~\ 1 + p v ) k ~ l 
0 

< 2 l ~ 2k R 

Finally, we need to analyze the dependency. Consider an edge /; let us define 

t=H(i+p(B)) 

B 

where B ranges over all bad events touching /. One can verify there are at most 2 L events of type 
B(f) (one for each color) and at most 4L 2 events of B(f, f") (either f or f" could touch /, and 
there are two possible colors). Hence we have 

t < (1 + y/ep\) 2L (l + ep 2 ) 4L2 < exp(2L v / epi + 4 L 2 ep 2 ) 

The LLL criterion is now 

PiVe > pit p 2 e > p 2 t 2 

which can be seen to be satisfied for L < 0.17y^jj^2 fc and k sufficiently large. In this case also we 
have t < 0(1). 

A data-structure to find bad-events. Now that we have formulated this problem for the 
LLL, we come to the core algorithmic challenge: finding bad-events efficiently. For this, we will need 
a data-structure D to track the following information: for each vertex v, we use a doubly-linked 
list to enumerate all monochromatic edges which contain v. 

For any edge / and vertex-coloring X, we let A(X, f ) be the event that / is monochromatic on 

A. 
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Proposition 3.6. The data-structure D allows us to find bad-events with an event-decomposition 

D(B,X)<k°^J2{j2 !+ E {[A( 9 ,X)] + [A(g',X)})) 

f~B g&N (/) g'GN{g) 

Proof. To simplify the notation, we write / ~ B if / is involved in B ; that is, if B is of the form 
/>’(/) or />’(/./')• 

First, we consider the cost to update the list of monochromatic edges. If an edge / was originally 
monochromatic and is resampled, we delete it from the k corresponding vertex-lists; that takes time 
0{k). If an edge / becomes monochromatic, we add it to the k corresponding lists, again in time 
0(k). The only edges which can change their status are those intersecting B, and so this is at most 
k - 

Next, we show how to find the bad events caused by resampling some edge /. To find an event 
of type B(g) affected by /, we simply loop over all the monochromatic edges g intersecting /, and 
check if they also satisfy the property that p(w ) > R for all w £ g: this takes time Yl g eN(f) ■ 

Next, we search for events B(g,g') in the configuration X, where g £ IV(/): we begin by looping 
over all edges g £ IV(/). If g is monochromatic on X, we loop over all g 1 £ N(g) and check whether 
B(g,g') is true on X. The total work for this is 

* 0<1, (£ 1 + [>‘(9.X)]I j V(s)|) 

g&N(f) 

Finally, consider how to find an event B(g,g '), where now g' £ N(f). We begin by looping over 
g' £ N(f ); for each such edge g' , we want to find any edges g where B(g,g') is true. Let G(g') 
denote the edges g £ N(g') which are monochromatic on X. We make the critical observation we 
can use our data-structure to enumerate, for each v £ g', all the monochromatic edges including 
v, and so each g £ G(g') is listed at most k times. Thus, the total work to enumerate G(g') is at 
most k\G(g')\] this is potentially much smaller than N(g'). Hence, the work for this step is 

k ° {1) { E i + W)!) 

g’eN(f) 

Putting all these terms together, we have that the total work expended searching for bad-events 
caused by resampling / is at most 

Time<fc°( 1 )( £ l + |G( ff )| + [H( 5 W)]|iV(< 7 )|) 

g&N(f) 

= *= 0<1| ( £ 1+ £ (Ws,A-)] + [A( 9 ',Jf)])) 

g&N(f) g'GN(g) 

Summing over all / ~ B, we have that 

D(B,X)<k°^Y^{ E 1+ E {i A (9,X)] + [A(g',X)})) 

f~B g&N (/) g'£N{g) 


□ 

Proposition 3.7. The expected total time for the MT algorithm to find a coloring is at most 
mk °^. 
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Proof. We apply Theorem 13.51 to the event-decomposition of Proposition 13.61 For any bad-event 
B(f), we have 


Ts(f) < k 0 ^(L + ^2 &(A(ff)) + 9(A(g')) S ) 

g£N(f),g'GN(g) 

For any edge g, we have Pn(A(g)) = 2~ k and so 0(A(g)) < Pci(A{g)) x t < 0(2~ k ). Thus, we 
have that 

T b(/) < fe° (1) (L+ 0(2" fc ) + 0(2~ k ) s j < k° W (L + 2~ k L 2 ) < Lk° (1) 

g£N(f),g'£N(g) 

Hence, the total expected work for this bad event B(f), over the entire execution of MT, is at 
most /x(H(/))Tb < piy/eLk 0 ^ < k 0< ^\ summing over all edges / gives a total time of mk°^ l \ 

A similar argument applies to estimate Trn/,/') < mk°^ and to bound the time required to 
initialize the data structure. Recalling that k = log 0 *- 1 ) m, this proves the theorem. □ 

4 Latin transversals 

Suppose we are given an n x n matrix A, in which each cell is assigned a color. Suppose that 
each color appears at most A < (27/256)n times in the matrix. We wish to select a permutation 
7 r € S n with the property that no color appears twice, that is, there are no distinct x, x’ with the 
property that A(x,n(x)) = A(x', tt(x')). Such a permutation is referred to as a Latin transversal ; 
see mmm for some of the long history behind this and related notions. 

One can apply the Lopsided LLL to the probability space defined by a random permutation. In 
this context, a bad-event is that we have 7r(a;) = y A tt(x') = y' where A(x,y ) = A(x',y'). In [15], 
it is shown that two events are dependent for this probability space (in the sense of the lopsided 
LLL) iff they overlap in a row or column of the matrix. 

In 124] . a variant of the MT algorithm was presented for finding such permutations in polynomial 
time. The algorithm is somewhat complicated to describe, but the basic idea of this algorithm is 
that one can resample bad-events by performing random swaps of the relevant permutation entries. 
These random swaps play the same role as a resampling in the usual MT algorithm. 

Although this algorithm and its analysis are much more complicated than the standard MT 
algorithm, one can still develop witness trees and show that Witness Tree Lemma holds. This 
implies that all the results about the MT-distribution do as well. This is one of the key advantages of 
the proof-technique developed in [24]; later works, such as [T] and [25], have developed substantially 
simpler and more general proofs of the convergence of the swapping MT algorithm, but these 
approaches do not extend to the MT-distribution results. 

Theorem 4.1. Suppose each color appears at most A < (27/256)n times in the matrix A. Then 
there is an algorithm to find a Latin transversal in expected time 0(n) assuming that we have fast 
read access to the matrix, namely: 

(Al) The entries of A allow random-access reads. 

(A2) The colors of A can be represented as bit-strings of length O(logn). 

(A3) Our algorithm can perform elementary arithmetic operations on words of size O(logn) in 
time 0(1). 
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Note that the input size to the problem is 0(n 2 ). 

Proof. Each bad event B has probability p = ■ It is shown in [24] that the asymmetric LLL 

criterion holds with these parameters and that y(B) = 0(p) for any bad-event B. For any x, y E [n] 
and any bad-event B, we say that B involves x or y if B contains a bad-event containing tt(x) = y' 
or containing tt(x') = y. We define w(x, y) = Ub involves x or yi 1 + h(B)). 

We can enumerate such events as follows: there are 2n — 1 choices for the first cell involving 
column x or row y, and A < 0(n) choices for the other cell with the same color. So there are 0(n 2 ) 
such bad events, and for each such bad event B we have y(B) = 0(n~ 2 ), so in total w(x, y) = 0(1). 

Now consider the following data-structure D. We first choose some pairwise-independent hash 
function H, uniformly mapping the labels of colors to the set [n] [9]. We will maintain a list, for 
each t € [n], of all pairs (x, y) with ^ r(x) = y and H(A(x,y )) = t. These can be maintained with 
a doubly-linked list for each element t £ [n ] in the range of H. We will update this structure 
during the execution of the Swapping Algorithm; for example, if 7r(x) = y and we resample to a 
new permutation ir' with 7r'(x) = y' , we would remove the pair ( x,y ) from the list corresponding 
to H(A(x,y)) and add the pair ( x,y') to the list corresponding to H(A(x,y')). It is not hard to 
see how to add and remove pairs from their appropriate list in constant time. 

Now consider the work required in a single step of D(B,X). The operation of adding and 
removing pairs from their corresponding linked-lists takes 0(1) time. The costly operation is 
that, for each affected position x in the permutation, we must loop over all pairs x, x' with 
H(A(x,tt(x))) = H(A(x', n(x'))) and test whether A(x, 7r(x)) = A(x', n(x')). If the latter holds, 
then we have detected a new bad event. 

Thus, suppose we resample B = (7r(xi) = y\) A (^(a^) = 1 / 2 ), obtaining the new permutation 
it'. There are four positions in the permutation n' that differ from 7 r, and we must test each of 
these to see if there are new bad events. Thus, the time to update D is given by 


Yl [ 7r, ( Xi ) = y'i A / (®3) 

?4e[n] X^X\ 

yi+y'\ 


y 3 A H(A(xi,y[)) = H(A(x 3 ,y 3 )) 


+ ... 


(Here, we have only written one of the four summands, corresponding to new bad events involving 
7r(xi) = y[. The other three summands are analogous, and will have the same cost.) 

By 2-independence of H , we have that the expected time to update D from a bad-event B is 


Y Y f 7 ^ 1 ) = y'i A 7,7 (^3) 

j4e[n] 

2 / 37 V 1 


V3 


x (l/n+ [A(xi,y[) = A(x 3 ,y 3 )]) 


+ ... 


This expectation is taken over the hash function H, not on any of the random choices during 
the MT algorithm. Thus, the permutations should be viewed as fixed values and not random 
variables. 

We can now apply Theorem 13.51 to calculate: 

Tb = Y ^'(xi) =y ' 1 A 7 t'(x 3 ) =y 3 )(l/n + [A(x 1 ,y[) = A(x 3 ,y 3 )]^j 

y' 1 ,x 3 ^x 1 ,y 3 j!=y l 1 

< Y p n(n'(x 1) = y[ A 7r'(x 3 ) = y 3 )w(x 1 ,y[)w(x 3 ,y 3 )(l/n + [A(x 3 ,y[) = A(x 3 ,y 3 )]) 

y'^xzAxim^v'i 


Using the fact that there are at most An = 0(n 2 ) values of y[,x 3 ,y 3 with A(xi, y[) = A(x 3, y 3 ), 
and our bounds w(x,y) < 0(1), we calulate that this Tb < 0(1). 
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Thus, the expected running time of MT is 

Y h{B)T b < 0(1) Y V: x ',y') = O(n). 

B x,i/,x',y' 

A(x,y)=A(x’ ,y') 

A similar calculation shows an O(n) time to initialize D. □ 

5 Non-repet it ive vertex coloring: from exponential to polynomial 

So far, we have examined problems in which good data structures can lead to polynomial improve¬ 
ments in the MT runtime. However, Theorems 13.1113.51 are much more powerful, and can indeed 
transform exponential-time algorithms to polynomial-time ones. We will consider a series of re¬ 
lated problems based on non-repetitive vertex coloring of graphs. These represent some of the few 
remaining cases in which the LLL provides a proof of existence, but for which we do not know 
corresponding polynomial-time algorithm. 

Given a graph G, we seek to color its vertices so that no color sequence appears repeated in 
any vertex-simple path; i.e., there is no simple path colored xx, where x can denote any nonempty 
sequence of colors. How many colors are needed in order to ensure such a coloring exists? This is 
known as the Thue number vr(G) of G, motivated by Time’s classical result that n is at most 3 for 
paths of any length [39] FI 

The problems of finding non-repetitive colorings and Thue numbers have been studied exten¬ 
sively in a variety of contexts. In [4], it was shown via the LLL that for any graph G with maximum 
degree A, tt(G) = 0( A 2 ). The original constant term in that paper was not tight; a variety of 
further papers such as HHilEJGo! have brought it down further. The best currently-known bound 
is that 7r(G) < (1 + o(l))A 2 |TT]. The analysis of [IT] does not use the LLL; it uses a non¬ 
constructive Kolmogorov-complexity argument which is somewhat complicated and specialized to 
the graph-coloring problem. 

While the MT resampling framework applies to this problem, the key bottleneck is to either 
find a bad event (a path with repeated colors), or to certify that none such exists. In this case, the 
number of bad events is exponentially large; more seriously, it is NP-hard to even detect whether 
a given coloring has a repeated color sequence m- So, in this situation it is intractable to find a 
data-structure for finding bad-events with good worst-case run-time bounds. 

In |19j . a constructive algorithm was introduced using C = A 2+e colors (i.e., if a slack A e is 
allowed). The basic idea of [19] is to apply the MT algorithm, but to ignore the long paths. This 
algorithm succeeds in finding a good coloring with high probability, □ and the running time is 
n °(i/e) _ polynomial time for fixed e. This cannot be amplified to succeed with probability 1, as it 
is not clear how to test whether the output of the algorithm is a good coloring. Thus, it is a Monte 
Carlo, but not a Las Vegas, algorithm. 

5.1 New results 

We present the first polynomial-time coloring that shows tt(G) < (1 +o(l))A 2 ; furthermore, our 
algorithm is Las Vegas. Until this work, no Las Vegas algorithms were known for this problem 

4 There are a few variants on this definition such as whether the edges or vertices are colored, and whether each has 
its own palette of colors or whether there is a common palette. For concreteness, we color vertices from a common 
palette; all of our bounds would apply to the other scenarios as well. We assume that the graph G is simple with 

2 < A < n - 1. 

5 We say an event occurs with high probability (abbreviated whp) if it occurs with probability 1 — n~ n ^ 1 K 
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where the number of colors C is any function of A, and no Monte Carlo algorithms were known 
where C = <j)A 2 for 4> any fixed constant. We also develop the first-known ZNC (parallel Las 
Vegas) versions of such results. 

As another application, Section T5.4I considers a generalization of non-repetitive colorings, intro¬ 
duced in [3], to avoid k-repetitions. That is, given an integer parameter k > 2, we aim to color 
the vertices to avoid the event that a sequence of colors xx... x appears on a vertex-simple path, 
with the string x occurring k times. (Standard non-repetitive coloring corresponds to k = 2.) The 
best type of result achievable in polynomial time using m is a coloring using 0( A 2+e ) colors, for 
any desired constant e > 0. Theorem 15.71 gives a Monte Carlo algorithm to find a coloring using 
C = A 1+ '^ rT + 0(A 2/3 + * = t) colors and which avoids any ^-repetitions, running in (he., 

polynomial) time. 

A second type of generalization of non-repetitive colorings comes from work of EZ3, which 
considered when it is possible to avoid nearly-repeated color sequences; that is, a sequence of colors 
xy where the Hamming distance of x and y is small. The work of m considered the problem for 
coloring paths. In Section f5.51 while we extend this to general graphs. This presents new algorithmic 
challenges as well. 


5.2 Non-repetitive vertex coloring 


Proposition 5.1. There is some constant (j> > 0, such that for any graph G of maximum degree 
A, there is a non-repetitive vertex coloring with C = A 2 + <^>A 5 / 3 colors. 


Proof. We show this via the LLL. A bad-event in this context is some vertex-simple path with 
a repeated color sequence, of length 21. We define fi(B) = a 21 for all such events, where a is a 
parameter to be determined. Our convention is that each color sequence gives rise to a distinct 
bad-event; thus, all bad-events are atomic and have probability C~ 21 . 

Now consider a fixed vertex v, and let us consider the sum p(v) over all bad-events B which 
involve vertex v. Such bad-events have the following form: There is a path of length 21, of which 
v is the t th vertex for some t = 0 ,..., l — 1 (by reversing the path, one can assume without loss of 
generality v comes in the initial half); the first l vertices have some pattern of colors, and the final 
l vertices have also this pattern. 

Summing over all possible values of t, l, all A 21 ^ 1 paths, and all possible C l color patterns, we 
have 


OO l 


i=i t =l 

a 2 CA 


-'a 21 


< 


(1 - a 2 CA 2 ) 2 


for crCA 2 < 1 


To show that the asymmetric LLL criterion holds, consider some bad-event B defined by a path 
vo, ■ ■ ■, V 21 — 1 - Its probability is C~ 21 . Its independent sets of neighbors can be determined by, for 
each i = 0,..., 21 — 1, selecting zero or one bad-events involving m. Thus, we have that 


21—1 

y n p(b') < jj (i+ h( v i)) a (i+ 

ICN(B) B'el i =0 

I independent 


a 


2 C A 


\2 1 


{l-a 2 CA 2 ) 2 ' 


Thus, the LLL criterion becomes 

a 21 > C~ 2l { 1 + 


crCA 


\2l 


(1 — a 2 CA 2 ) 2 
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which is satisfied for all / > 1 iff 


„ ^ 1 a 2 C A 

OL C > 1 + 7 “-o A 9 \ o 

(1 — a 2 CA 2 ) 2 

Set a = (y/C(A + A 2 / 3 )) -1 ; routine algebra shows that Q holds for <f> sufficiently large 

The challenge is to turn this exisential proof into an efficient algorithm. The key bottleneck is 
to search for some true bad event; we will do so via Theorem 13.51 The following intermediate result 
will be useful. (Recall the definition of 9 from (|2]) 

Proposition 5.2. Suppose we have any event E of the form x( v i) = c i A x( v 2 ) = C 2 A • • • A x( v k) = 
Ck, where v±,..., v k are distinct vertices and ci,..., c k are color labels. Then we have that 

9(E) < a k 

where a = (VC (A + A 2 / 3 )) -1 . 

Suppose we have any event E' of the form %(ui) = x( u i) A • • • A x( v k ) = x( u k), where 
v\,... ,Vk,u\,... ,Uk are distinct vertices. Then we have 

6(E') < P k 

where (3 = (A + A 2 / 3 )" 2 . 

Proof. The event E has probability Pq(E) = C~ k . To form an independent set of neighbors of E, 
one may select, for each i = 1 ,,k, one or zero path including Uj. We have already computed 
this sum in Proposition 15. 11 and so we have that the sum over all such independent sets is at most 

Because the LLL criterion is satisfied, we have that this is at most ( aC) k . Thus, overall we 
have 

6(E) < C~ k x (aC) k = a k 

The bound on E' follows by taking a union bound over all possible colors ci,... ,c k and com¬ 
puting the probability that x(n) = = x(«i) A • • • A x(vk) = c k = %(u fe ). □ 

In Theorem 15.41 we will show via Theorem 13.51 that the coloring can be found in 0(n 2 ) time 
using the DFS MT algorithm. As a warm-up exercise, we begin with a slightly weaker result; we 
use Theorem 13.11 to produce the coloring in poly(n) time. 

Theorem 5.3. The coloring of Provosition HOI can be found in expected time 0(n 3 A 4 / 3 ). 

Proof. We construct a search algorithm to find bad-events which are currently true. We suppose 
that C < n, as otherwise this is trivial (assign each vertex a distinct color) 

To begin, we sort all the neighborhoods of every vertex by color. As the number of colors is 
0(n), then this step can be implemented in 0(n 2 ) time. 

Now, suppose we want to find a vertex sequence vo, ■ ■ ■ ,V 2 i~i of length 21 , where l is fixed. We 
construct a branching process for i = 0,...,/ — 1, wherein in stage i we enumerate over possible 
values for Vi,Vi + i- In order for these correspond to a bad-event, it must be that x( v i) = x( v i+i)- 
Furthermore, Vi,Vi + i must be neighbors of Uj-ijUj+i-i respectively (unless i = 0). Finally, all the 
vertices vq, ..., V 21-1 must be distinct. 

Because we have sorted the adjacency lists of all the vertices by color, then for i > 0 and a fixed 
sequence vo ,..., Vi-\,vi ,..., 1 one can enumerate over Vi, V{ + i in time 

t 1 + [x( v i+l) = X(v<)]) 

Vi£N(vi- 1 ) v i+ i£N(v i+ i_ 1 ) 


(4) 

□ 
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(In this sum, and all the sums we encounter, we enforce the requirement that the vertices are 
distinct; we do not write this explicitly in simplify the notation.) 

Summing over all possible choices for vo, ■ ■ ■, «j_i, Vi, ..., Vi + i- 1 , the overall time is given by 

Y [xN=xW A -xM = x(»i+i-i)l Y ( 1+ Y lx(vi+i) = x(vi) ]) 

VO ,...,v i - 1 ,vu...,v i+ i- 1 Vi&N(vi- 1 ) Vi+i£N(yi+i-i) 

Similarly, for i = 0, we can do this in time 

= xM) 

vo V 

Thus, summing over i = 0,. .., l — 1 and l = 0,..., n, we have an event decomposition of the 
form 

n 

Time < n2 + Y Y ^ 1 + Xfcfa) = ^( u °)]) 

1=0 v 0 vi 

l-l 

+ Y Y [x(«o) = x{vi) A ... x{vi-i) = xK+i-i)] 

( Y t 1 + Y bdyi+i) = xfa)])) 

VieN(vi- 1) v i+ ieN(v i+ i_ i) 

We evaluate T as in Theorem 13.11 For each value of l, the term ^^(1 + Ylv ix( v l) = x(^o)]) 
contributes n + Ylv 0 v t ®(x( v o) = x{' v i))'i by Proposition 15.21 the latter has value at most n 2 f3. 
Similarly, each of the terms 

Y [xM = x(vj)A-■ ■ x(«i-i) = x(«i+i-i)] Y ( 1+ Y lx(vi+l) = x(«i)D 

VO ,...,Ui_i,uj,...,u i+i _i Vi£N(vi-i) v i+ ieN(v i+ i_i) 

contributes n 2 A 2l_1 /T + n 2 A 2l /3 i+l . 

Summing over l,i, we have 


n l—l 

T <n 2 + J^(n 2 /3 + Y n 2 A 2i_1 /3* + n 2 A 2i /T +1 ) 

1=0 i= 0 

n l—l 

< 0(n 2 )( 1 + EE /TA 2i+1 ) 

J=0 i=0 

oo 

< 0(n 2 )(l + n Y^ P i A 2i+1 ) 

i= 0 

= 0(n 2 A 4 / 3 ) 

Next, observe that the total sum of /r(-B) over all B € £> is at most ^2b involves v mC®) — 
naC < O(n). Thus, the overall time is at most (1 + < 0{n) x 0(n 2 A 4 / 3 ). □ 

We want to emphasize the intuition here, which is that searching for a repetitive coloring in 
the intermediate configurations of the MT algorithm is very similar for searching for a repetitive 
coloring in a completely random configuration. One could compute the expected running time of 
this branching algorithm on such a random coloring. This would give identical formulas, with the 
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only difference being that all instances of a in the above proof would be replaced by the slightly 
smaller value C , the probability that a given vertex has a given color. 

We next improve on this by using depth-first search for MT, as well as being slightly more 
careful in our search algorithm. 

Theorem 5.4. The coloring of Proposition HOI can be found in expected time 0(n 2 ). 

Proof. We assume throughout that A < y/n, as otherwise this is trivial (simply assign each vertex 
a unique color). 

We will maintain a data structure D in which we maintain the adjacency list of each vertex 
sorted by color. This costs 0{n 2 ) to initialize. 

Suppose we are given a bad-event B, which is a path of vertice ujq, ..., W 2 k-i which is repetitively 
colored. In order to apply the depth-first-search MT algorithm, we must update D identify any bad- 
events involving any vertices wq, ..., We shall first show how, given a single vertex v, one can 

update D identify any bad-events events involving v. We shall construct an event-decomposition 
such that 

Time for vertex v < E ^mx)\ 

events E 

where x is the coloring after resampling B and c V) e are non-negative constants. 

For each such vertex v, let us define 

Tv= °v,Ee{E) (5) 

events E 


Then by Theorem 13.51 we have 

Tb A T W1 + T W2 + • • • + T W2k l 

So, in order to bound Tb, it suffices to show an upper bound on T v , for a given vertex v. 

Thus, suppose we are given a configuration and a fixed vertex v, and we wish to update D and 
determine if v participates in any paths with repeated colors. We begin by updating the sorted 
adjacency lists for each neighbor of v; this takes time 0( A 2 ). 

Next, say that v participates in a repeated path vq, ..., V 21-1 of length 21, and occurs in position 
t < l. For the moment, let us suppose that t = 0 and l is fixed. To emphasize the position of v in 
the list, we write vt = v = vq. 

We will use a branching process similar to Theorem 15.31 in which a story corresponds to a list 
of distinct vertices vq, v\, ..., u*, vt, vi + 1 ,..., vi + i for some i = 0,..., l. 

We begin by looping over the vertex in position l, restricting the search to vertices vi which 
has the same color as vq. We also loop over all neighbors v\,vi+i of vq,vi respectively. Again, if 
they have the same color (and also v\ vi + i), then we continue the search otherwise we abort. 
We continue this process, looping over pairs of vertices V 2 , ■ ■ ■, u;_i, vi + 2 , ■ ■ ■, V 2 i-i- At each stage 
of this branching process, we insist that the colors in the path are repeated up to that point, and 
all vertices are distinct. At the end, we examine if the resulting path corresponds to a bad event. 
We can do a similar procedure if t 7 ^ 0; we begin by guessing vertices vt+ 1 , ■ ■ ■, u;_i, v t+ i,..., V 21-1 
and then branch backward on vt- 1 ,..., vq, vi+t- 1 , ■ ■ ■ , v l- 

As in Theorem 15.31 we can perform this enumeration in overall time 

A x (5Z = X(v)] + = A *M = X(^')]) ( 6 ) 

v'^v v’^v 

w g N(v) ,w' G N' (v) 
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where here the terms w, w' indicate potential candidates for vi+± and v' is a potential candidate 
for vi. 

By Proposition 15.21 the overall contribution of this expression is at most to ([5]) is at most 

Ax (A + A 2 / 3 )" 2 + ^(A + A 2/3 ) -4 ) 

v'^v v'^v 

wE N(v) ,w' E N' (v) 


which is 0(nA 1 ). 

Continuing in this way, we see that the r th level of this branching process has overall contribution 
to © of 0(nA 2r+1 /3 r+1 ). 

With a little thought, one can see that it is not necessary to specify a fixed value of l,t for 
this branching. Once one specifies the initial vertex vt (without necessarily knowing t ) and the 
corresponding vertex (again, without necessarily knowing l), one merely has to decide how 
many steps to branch forward/backward from these two vertices. If at some point during this 
branching process one detects a repeated color sequence, one can then infer the corresponding t, l. 

If one branches r± forward steps and r 2 backward steps, then the contribution of the resulting 
work factor to T v is similarly 

0(nA 2(ri+r2 ) +1 x /T 1+r2+1 ) 

Summing over ri,r 2 , one has the total work for v is at most 

OO OO 

T v < A + 0(^2 nA 2 ( ri+r2 ) +1 x p r ^+ 1 ) 

ri =0 r2=0 

a simple calculations shows this is at most 0(A + nA^ 1 / 3 ) < O(n). 

This bound on T v yields a bound on Tq for any bad-event B which is a path of length 21: 

Tg < 2/ x 0(n) 

Summing over all such bad events, we have 

OO 

J2^(B)T b <^nA 2l ~ l C l a 21 x 2 lx 0(n) < 0{n 2 ) 

B 1=1 


□ 


5.3 Parallel algorithm for the Thue number 

Moser <fc Tardos introduced in j.‘I2j a generic parallel form of their resampling algorithm. This 
algorithm can be summarized as follows: 

1. Draw X \,..., X n from fh 

2. Repeat while there is some true bad event: 

3. Choose (arbitrarily) maximal independent set I of currently-true bad events B € B. 

4. Resample all the bad-events B € I in parallel. 
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As shown in [22], this algorithm will terminate with high probability after 0(-2SH) rounds, as 
long as we satisfy a slightly stronger form of the LLL criterion, namely we satisfy it with e-slack. 
That is, for each bad-event B we require 

p(B)>(l + e)9(B) 

for some e > 0. Furthermore if we can detect the currently-true bad-events in time 0( log 2 n), then 
the overall running time is 0( log ra ). 

In order to turn this into an efficient randomized algorithm, it suffices to enumerate at each stage 
all currently-true bad-events, using polylogarithmic time and polynomial space. (This automati¬ 
cally implies that there are a polynomial number of true bad-events, and so a maximal independent 
set of them can be found efficiently via Luby’s algorithm. H 

Proposition 5.5. There is a constant 4> > 0 such that any graph G of maximum degree A can 
be C-colored to avoid repetitive vertex-colorings as long as C > A 2 + (/>A 2 /logA. Furthermore, 
such a coloring can be found in ZNC (Las Vegas NC): the algorithm terminates successfully with 
probability 1 after expected time 0(log 4 n) using poly(n ) processors. 

Proof. Along the same lines as Theorem 15.41 a sufficient condition for the parallel MT algorithm 
with e slack is 

Ca ~ (1 - a 2 CA 2 ) 2 ~ nTcT/ ^ 2T<y2T > 1 + 6 ( 7 ) 

and this is satisfied for a = (A 2 + A )^ 1 . 

For cf),x sufficiently large, the LHS of 0 is a decreasing function of A, hence reaches its 
minimum value at A = n. At this point, one can observe that (0 is satisfied for e = 12(1/log n). 
Thus MT terminates after 0(log 2 n) iterations whp. 

Our task becomes to develop a branching process for finding currently-true bad-events, whose 
expected number of active stories is bounded by a polynomial and whose running time is polylog¬ 
arithmic. 

We will use a branching which proceeds through l = 1 , 2 ,..., log 2 n rounds. At each round l , 
we enumerate all sets of vertices vq, ..., v^-i, wq, ..., w^-i which satisfy the following conditions: 

(Bl) k < 2 l 

(B2) x(u 0 ) = x(u; 0 ), • • •, x(vk- 1) = x(™k- 1) 

(B3) Vo,..., Vk-i,wo ,..., Wk ~i are distinct. 

(B4) vo, ■■■, Vk~i and wo,..., Wk-i are paths. 

To extend the set of stories from stage l to stage l + 1, we use the following observation: if 
vo,..., Vk-i,wo, ..., Wk- i satisfy these conditions at stage l + 1, then vo, ..., Vk/2~\,wo, ■ ■ ■, w'fc/ 2-1 
and Vk/2, ■ ■ ■ ,Vk-i,Wf./2, ■ ■ ■ ,Wk~ 1 both satisfy these conditions (separately) for stage l. Thus, we 
may build the set of all stories satisfying these conditions recursively by pairing stories at stage l 
and checking if they survive to stage l + 1. 

Furthermore, we see that if there are V( stories satisfying these conditions at each time t and 
stage l, then for each l this pairing requires time V) 2 poly(n) and time O(logn). Thus, if we show 

6 Alternatively, [32] shows that the parallel algorithm terminates after 0( — M(r>) ) iterations, and one may 

show directly in this case that this is The analysis of [22] shows this directly without needing to compute 

E bM b )- 
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that V] < poly(ra) for each 1 = 0,..., log 2 n then this shows that this process can be implemented 
using 0(log 2 n) time and poly(n) processors. 

Next, we claim that it suffices to show that E[V)*] < poly(n). For, suppose that E[V) 4 ] < n r . 
Then by Markov’s inequality we have that whp V* < n r xT x log 2 n x n 100 . Furthermore, one may 
easily detect if V) exceeds this bound; if so, we abort the algorithm and start from scratch. 

Finally, we turn to estimating E[V) 4 ]. Given any fixed sequence no,..., Vk~i,wo ,..., Wk-i sat¬ 
isfying (Bl), (B3), (B4), we may slightly modify the proof of Proposition 15.21 to see that the 
probability that it satisfies (B2) as well is at most (3 k for 

P = Col 2 

Now, in a manner similar to Theorem 15.31 we may take a union bound over all k = 1,... ,2 l 
and all vertices Vo, ■ ■■ ,Vk~i,wo,... ,Wk-i satisfying (Bl), (B3), (B4) to see that E[V) 4 ] < poly(n). 

Thus, the overall expected running time is O( log - ) = 0(log 4 n) using a polynomial number of 
processors. □ 


5.4 Higher-order Thue numbers 

Recall the notion of k-repetitions introduced in [3]. That is, given a parameter k, we want to avoid 
the event that a sequence of colors xx... x appears on a vertex-simple path, with the string x 
occurring k times. 

It is not hard to extend the analysis of Theorem 15.41 to obtain an algorithm for fc-Thue number 
as follows: 

Theorem 5.6. For some constant <p > 0, there is a Las-Vegas algorithm which takes as input a 

i | 1 o /q | 1 

graph G and parameter k, and produces a vertex coloring with C = A k - 1 + (f> A w colors 

which avoids k-repetitions. This algorithm runs in expected time n k+ °^. 

For any fixed value of k, this is a polynomial-time algorithm. But developing an algorithm 
whose running time scales with k, presents new algorithmic challenges. Note that the approach of 
m, which is based on finding a “core” set of bad events which can be checked quickly, will not 
work here — for, the work required to check even the color sequences of length 1 (the simplest class 
of bad event), is already nA k , which can be super-polynomial time. 

Our main result here is: 


Theorem 5.7. For some constant (f> > 0, there is an algorithm with the following properties. It 
takes as input a graph G, a parameter k, and a parameter e. It runs in expected time and 

produces a vertex coloring with C = + <f> A 2//3+fc - 1 colors, which avoids any k-repetitions 

whp. That is, there is no vertex-simple path in which a color sequence is repeated k times. Note 
that this is not a Las-Vegas algorithm. 

Proof. Suppose we are given a fixed e > 0. As in Theorem 15.41 for any bad-event B of length kl, 
we set p(B) = a kl , where a = (A 14- ^ 31 + ^A 2//3+fe -i) 1 Now observe that for cf> > 0, we have 
a k CA k < 1, so the LLL criterion reduces to 


Ca> 1 + 


ka k CA k ~ l 
(1 - a k CA k ) 2 


( 8 ) 


The LHS of ([8]) can be written as a function of A,k,(f, and a parameter v = A e ^ k 1 h By 
routine calculus, we see that this is indeed satisfied, for all k, A, for <f> sufficiently large. (The worse 
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case comes when k is small, v = 1, and A —>• oo). Routine calculations show that this satisfies the 
LLL criterion for 0 sufficiently large. 

The remaining task is to find any bad events which are true in a current configuration. To 
begin, we will simply ignore any color-sequences whose length is greater than some threshold 
L = for some sufficiently large constant x. We claim that, even though we do not check 

these events explicitly, the probability that any such bad event ever becomes true, is negligible. For 
the probability that there is such a long path is at most has length i > L < 'YaLl nC l A kl a kl ; 
routine analysis shows that this is n So we only need to check the shorter sequences. 

Now, suppose we wish to check for a ^-repetition involving a color sequence of length l. As 
we are not attempting to determine exactly the exponent of n, we will simplify our task by using 
Theorem l3.ll searching the entire graph for repeated color sequences. We will also simply enumerate 
over the exact value of the length l of the path, rather than attempting to handle all values of l 
simultaneously. These simplifications are both wasting work but only by a factor of n °^. 

We begin by guessing the full /-long color sequence. Once this color sequence co,...,q_i is 
fixed, we use a branching process; a story at stage i consists of the vertices vo,...,Vi in order, 
which agree with the color sequence (that is, v; t has color Cj mo d l)- 

Let us consider the overall cost of this branching process. At the i th level of this process, we 
must enumerate over colors sequences c\. ... . C( and possibilities for the vertices vq Thus, 
we may write the cost as 

Cost of i th level < E E [x(To) = c 0 Ay(ni) = a A...] 

C0,...,Q_1 V0,Vi€N(vq),V2£N(vi),... 

vo,... ,Vi distinct 

This event-decomposition is in the appropriate form to apply Theorem 13. 11 By Proposition 15.21 
(using a different definition of a), we have 9(x(v o) = Co A x( v i) = ci A ■ ■ ■ A x( v i) = Cj) < cd +1 - As 
there are C l choices for the colors Co,..., q_i and nA l choices for the vertices no,..., v,, the total 
contribution of this expression is at most nAW +1 . Thus, summing from i = 0,..., kl, we see that 
overall we have that the overall cost to find bad-events of length l is at most C l Ya=q nA l a l+1 < 

n°MC l . 

As we are only examining color sequences of length at most L, the expected work overall is at 
most T < n°^C L < 

It is notable in this proof that we need to combine the method of m, which is based on identi¬ 
fying a core subset of bad events, with the fast-search method of Theorem l3.ll In this application, 
the large bad events cannot be searched efficiently; searching the small “easy” bad events efficiently 
takes exponential time in general but is polynomial time on the random configurations presented 
during the MT algorithm. □ 

5.5 Approximately-repeated color sequences 

In |27j . the idea of non-repeated color sequences was generalized to avoiding p-similar color se¬ 
quences, for some parameter 0 < p < 1. If x,y are two color-sequences of length /, we say that x, y 
are p-similar if x,y agree in at least \pl~\ positions. When p = 1, of course, this simply means that 
x = y. Hence the problem of coloring the graph to avoid p-similar color sequences generalizes the 
problem of non-repetitive coloring. Although the work of m considered the problem for color se¬ 
quences alone, this generalization has not been studied in the context of graph coloring. It presents 
new algorithmic challenges as well. We present the following result: 
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Theorem 5.8. There is some constant cf> > 0 with the following property. For all p E (0,1] and 
any graph G with maximum degree A, there is a coloring that avoids p-similar sequences, with 

C = p-\ 1 - p) l ~ 1/p { A 2 + ^A 11 / 6 ) 1 ^ 
colors. Furthermore, such a coloring can be found in expected time n°^. 

Proof. Define the usual entropy function h = h(p) = —(1 — p) ln(l — p) — pin p. 

We can enumerate the bad events as follows. If we have a sequence s of 21 vertices, and a 
/-dimensional binary vector w which has Hamming weight Ft (w) = \pl~\, we define the bad event 
Bw,s which is that vertices s t , Si+i have the same color for all indices i which = 1. It is not hard 
to see that there is an p-similar vertex sequence iff there is some w, s where the bad event B WtS 
occurs. (We can further insist that the vector w has w\ = 1; this gives slightly better bounds but 
does not change the asymptotics). 

Set p(B) = a 21 for a bad-event of length 21, where a = e~ h / p (A 2 + ^A n / 6 ) _1 / p 

Let us count the bad events involving a vertex v. We enumerate this as follows. There are 
(2/) A 21 - 1 paths involving vertex v. We must check a vector w € {0,1} Z which has a 1 in the 
position corresponding to vertex v; this gives us ( \pfi- 1 ) further choices. Then there are C ^ 
choices for the color sequence shared by x, y. Any such event has probability a 2 . Summing over 
all l gives us a total contribution of 

J>(£) < f>0A 2 ‘- 1 ( 

B involves v 1=1 P ' ' 

oo r(fc +i)/p i — 1 /, .\ 

= ^(a 2 C)‘ £ P'A'HA,) 

k= 1 l=\k/p\ V J 

2a 2p Ae h 

- (1 - a 2p C p A 2 e h ) 2 

Hence the asymmetric LLL criterion for avoiding such p-similar edge colors reduces to 

2a 2p C p Ae h 

Ca (1 - a 2p C p A 2 e h ) 2 ~ 1 

Routine calculus shows that the LHS is decreasing in p. So the worst case is when p = 1; then 
simple calculus shows that this is satisfied for f sufficiently large. 

We now come to the main algorithmic challenge: finding a bad event (if any are currently true). 
One might naively expect to apply the branching process of Theorem 15.41 first choose the first 
and middle vertex in the path. Then branch on the vertices, aborting the search early if the color 
sequence so far has too many disagreements. To see why this naive branching process does not 
give a polynomial-time algorithm, observe that we will not be able to remove any stories in the 
early stages of the branching, because we might have a color sequence xy in which the agreeing 
positions all come at the end. Thus, the collection of stories will increase exponentially before 
collapsing exponentially. Although the number of final stories is relatively small, the intermediate 
story counts can become large. We want the agreeing positions to come fast enough to keep the 
number of stories small throughout. 

We will branch on the color sequence starting not from the vertices at positions 0, l (the first 
and middle vertex in the path), but rather starting at positions i,l + i for some well-chosen i = 
0,...,/ — 1. At the t th stage of the branching process, we will branch on the vertices at positions 
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i + t,l + i + t modulo 21. Here, t = 0 corresponds to the initial choice of vertices, and t = 1 
corresponds to choosing the first edge emanating from them. At stage t of the branching, we 
insist that the number of agreeing positions seen so far, is at least \tp\; otherwise we remove that 
possibility from the branching process. 

To summarize, we use the following algorithm to find bad color sequences of length 21: 

1. For a = 0,..., l — 1 repeat the following: 


2. Initialize with a single, null story. 

3. For t = 0,... , l — 1 do the following: 

4. For each story in the stack, count the number of positions at which the color se¬ 
quences agree so far. If this number is smaller than \pt], remove the story from the 
stack. 

5. For each story remaining in the stack, choose the vertex at positions (a+t) modulo 21 
and (l + a + t) modulo 21. Extend each story in all valid ways. 


We will first show that the running time for this algorithm is polynomially bounded. Let us fix 
some value of a, t, and consider the expected number of surviving stories. These must correspond to 
vertex paths of length t whose color sequences agree on at least \pt] positions. There are A 2t n°^ 
choices for the vertices. For a fixed path, we can bound the probability that they agree on \pt\ 
positions as at most 




c \pt\ a 2\pt\ < n O(l) e ht c pt a 2pt < n O( 1) 


A 2 + </>A n / 6 \t 
(A 2 + (0/2)A 11 / 6 ) 2 


Hence, the total expected number of stories for given a, t is at most 


n°« A 2 *- 1 


A 2 + 0A 11/6 y 0(1) 
(A 2 + (0/2) A 11 / 6 ) 2 / - 


Next, we must show that any bad event will indeed be discovered by this branching process. 
For, suppose x, y are color sequence of length l which agree on p'l > \pl\ positions. For i = 1,..., l 
define Sj to be the total number of agreements in positions 1for i outside this range, define 
Si := Si m od i- We also define the parameter r* = Sj — p'i. Because x, y agree on exactly p'l positions, 
the sequence r is periodic with period l. 

We claim that for the value of a in the range 1,...,/ which minimizes r a , then the color 
sequence xy will survive the corresponding branching process. For, suppose at stage t, we lose 
xy. This implies that the total number of agreements between stages a, a + t is strictly less than 
\pt\ < p't. This implies that s t+a < s a + p't and hence r t+a < r t , contradicting minimality of a. □ 


6 Partially avoiding bad events 

When the LLL condition is satisfied, then it is possible to select the variables so that no bad events 
occur. Alternatively, if one simply selects the underlying variables from 0 directly, then each bad 
event B occurs with probability Pq(B). However, there can be a middle ground. As described 
in EEJ even when the LLL condition is violated, one can use the MT-distribution to select the 
variables so that many fewer bad events occur than one would expect from 0. For example, if in 
the symmetric LLL setting we have epd = a, for a. E [1, e], then one can show that it is possible to 
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cause at most (1 + o(l))mpeln(a)/a events to occur; here o(l) is parameter which decreases with 
the dependency d |T9l . 

The result of m is based on the following idea: select each event to be a “core event” indepen¬ 
dently with probability q. These core events will not be allowed to occur; the non-core events are 
ignored. Each core event has on average dq core neighbors. For d sufficiently large, one can apply 
Chernoff bounds and the MT algorithm to ensure that the number of core neighbors is close to dq. 
Now, apply the MT algorithm a second time to avoid the core events, and show that in the MT 
distribution the non-core events have a high probability of being avoided. 

While the method of [19j is intriguing, it suffers from a few shortcomings. First, the result is 
asymptotic; there is a second-order term, which is difficult to compute explicitly, and only goes 
away as d —>• oo. Second, this algorithm may be computationally expensive; the first application of 
the LLL, in particular, may dominate the second, “real” application, and may even be exponential 
time. Third, one obtains only gross bounds on the total number of true bad events; one cannot 
easily get more detailed information on the average behavior of a particular bad event. 

In this section, we give new bounds and algorithms for partially avoiding bad events, which 
avoid these problems. In many cases, these algorithms are faster than the Moser-Tardos algorithm 
itself. The basic idea parallels m, in that we mark each bad event B as core with probability 
q(B). However, instead of using two separate LLL phases, we combine them into a single one. 

Recall the definition of 9(-) from ([2]). 

Theorem 6.1. Suppose we are given a mapping p : B —>• [0,oo). Then there is an algorithm, 
which we refer to as the Truncated Moser-Tardos Algorithm, whose output distribution Ll' on the 
underlying variables X\,.. ., X n has the property 

MB € £>, Pfi'(B) < max(0 ,0(B) - p(B)) (9) 

This algorithm has the same running-time behavior as other Moser-Tardos applications. In partic- 
ular, the expected number of resamplings of a bad event is p(B). (Note that the LLL criterion is 
simply that the RHS of (01 is equal to zero.) 

Proof. Given our original set of bad events B, we define a new binary variable Y(B) for each bad 
event, which is Bernoulli-g(R) and which represents that B is “core”. We introduce a new set of 
bad events B ', defined as follows: for each bad event B € B, we define B 1 € B' to be the event that 
B is true and Y(B) = 1, where we define q(B) = min(l, fjgy)- The truncated MT algorithm for B 
is then defined by running the MT algorithm for B'. 

It is not hard to see that the set of bad events B' satisfies the asymmetric LLL criterion with 
the weighting function p. 

Now, consider a bad event B. In order for B to occur in the output, it must be the case that 
Y(B) = 0. Thus, we have that Pqi(B) = Pq'(B A ( Y(B ) = 0)). We now apply Proposition 12.51 so 
that P n ,(B A (Y(B) = 0)) < 9(B A (Y (B) = 0)) = 6(B)P n (Y(B) = 0) = (1 - q(B))6(B). By our 
choice of q(B), this is max(0, 9(B) — p(B)). □ 

This specializes easily to the symmetric setting by setting p(B) = (e/a) 1 ^ — 1 for all B: 

Corollary 6.2. Suppose each bad event B has Pn(B) < p, |iV(B)| < d; and suppose that epd < a 
for a € [l,e]. Then one can efficiently construct from a probability space Q' in which each bad 
event B occurs with probability at most The expected number of total resamplings is 0(m/d ) 
to draw from Ll'. 
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6.1 Applications 


As an example of the asymmetric form of Theorem 16.11 consider /c-SAT instances where each 
variable may appear in up to L clauses in total (positively or negatively). Applying the Lopsided 
LLL, it is shown in m that L < implies that the instance is satisfiable. We prove that this 

can be relaxed so that the instance is partially satisfiablelll 

Theorem 6.3. Suppose we have a k-SAT instance with m clauses, in which each variable appears 
in up to L < a2 ^ 1 — 2/k clauses (in total, either positively or negatively), for a € [1, e]. Then we 
can construct in expected time mlog°^m a truth assignment whose expected number of satisfied 
clauses is at least m{ 1 — 2~ k eln(a)/a). 

Proof. We assume that m > 2 k ~ 1 as otherwise a randomly chosen solution will satisfy all the clauses 
with probability 1/2, and the result follows trivially. 

Suppose a variable x, appears in Z* clauses; of these occurrences, it appears 5if positively and 
(1 — 5f)li negatively. Then, following the counter-intuitive choice described in |16| . we set variable 
i to be T with probability 1/2 — x(fii — 1/2), where x € [0,1] is a well-chosen parameter. 

We set p.{B) = z for all bad events B , where z is a parameter to be chosen. In this case, it 
suffices to show that 

MB € B,—z + Pq{B) exp(^ z) < 2~ k e\n a/a (10) 

It is not hard to show, following [16], that for x = Lz/2 the LHS here is maximized when 
variables corresponding to the bad event B each occur in exactly L/2 clauses positively or neg¬ 
atively; and that in this case, we have Pq(B) = 2~ k , and there are 1 + Lk/2 neighbors of B in 
the dependency graph. (The factor of L/2 here comes from the Lopsided LLL; namely, clauses 
that intersect on a variable and agree on it, are not counted as dependent for the purposes of the 
Lopsided LLL.) 


Thus, we set z = 


2In( 


2 +tl L an d then we have the bound 


-z + Pn(B) exp( z) < — z + 2 k exp(z(l + Lk/2)) 

21n(l + kL/2) + 2 — feln 4 


B'~B 


2 + kL 


= 2 k e\n{a)/a 

Now, the expected number of resamplings is at most mz < mlog°^ m/L. For each resampling, 
we must scan all the affected clauses to see if they have become falsified, which takes time k°^L < 
Llog 0 ^ 1 ) m. Hence the total expected runtime is mlog 0 ^ m. □ 


We can also apply this result for partial Latin transversals. Although our theorems have been 
stated in the context of the standard Moser-Tardos algorithm, they only depend on the Witness 
Tree Lemma. As we have discussed earlier, such results apply in essentially the same way for the 
permutation-LLL setting described in [2Tlj . 

Definition 6.4. Given an nx n matrix A, a partial Latin transversal is a selection of k <n cells, 
at most one in each row and column, with the property that there are no two selected cells with the 
same color. 

'One may verify that Theorem 16.II holds for the variable-assignment LLLL, in which bad-events are dependent iff 
they disagree on a variable. 
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Partial Latin transversals have been most studied in the case when A is a Latin square. In [38]. 
Stein analyzes the case of partial Latin transversals for arbitrary matrices. Using techniques from 
that paper, one can show the existence of partial Latin transversals, whose length is a function of 
A, the maximum number of occurrences of any color. This generalizes HU, which showed that if 
A is sufficiently small, then a full Latin transversal exists. 

Theorem 6.5. Suppose each color appears at most A = /3n times in the matrix A for /3 € [0,1]. 
Then one can construct a partial Latin transversal of length at least n x —. 

Proof. Suppose that we select a random permutation 7r; whenever a color appears more than once 
in 7r, we will remove all but one of those cells from it to turn it into a partial Latin transversal. 

Suppose that a color appears d < n times in the matrix. As shown in [38] . the probability that 
7r meets the color at least once is minimized when all d occurrences of the color are in distinct rows 
and columns; in this case the probability is (by negative correlation) at least 1 — (1 — 1 /n) d . 

Thus, summing over all colors i, the total expected number of colors appearing in 7r is at least 
)T/ 1 — (1 — 1 /n) di . By concavity, and using the facts that d* < A,]Udj = n 2 , this is at least 

7^(1 _ e_/3 )- 

Thus, the resulting partial Latin transversal has an expected length of at least n(—|—) as we 
claimed. □ 


We can improve on Theorem 16.51 for /3 < 0.19 by using the MT-distribution. (Note that for 
/3 < 0.105, the LLL constructs a full Latin transversal.) 

Theorem 6.6. Suppose each color appears at most A = fdn times in the matrix A, for (3 € 
[0,1/4]. Then the truncated MT algorithm runs in expected time 0(n) and produces a partial Latin 

transversal whose expected length is at least n ■ min ^ 1, 4 + 2Q 2 ^ ■ 

Proof. For every pair of cells (i,j), (i',j') such that A(i,j) = A(i',j'), we have a bad- event 7r(i) = 
j A ir(i') = j'. We apply Theorem 16.11 setting pi(B) = a = n^-i) ~ l) f° r eaf ’k such 

bad-event. In each independent set of neighbors of a bad-events, for each of the four coordinates 
i, j. i'■ j'. one may select zero or one bad-events which overlap on that coordinates. 

Thus, thew space LI' has the property that for each B we have: 


Pci' (B) < max(0, 0(B) - n(B)) 

, 1 

< max(0, —a -|- - -- 

n[n — 1) 

< max (o, 

V ’ n(A - 1) 


(1 + n(A - l)a) 4 ) 


Now consider the following experiment: we draw the permutation 7r from the space LV. For 
each bad-event that occurs, we de-activate one of the two cells (chosen arbitrarily). Let Q denote 
the number of active cells at the end of this process; then Eq/[Q] > n — YIb Pci'(B). 

The total number of bad-events can be computed as follows. First, there are n 2 choices for i,j. 
Next, there are A — 1 choices for This double-counts the number of bad-events, so in all there 
are at most re 2 (A — l)/2 bad-events. 

Thus 


Esy [Q]>n — 


n 2 ( A — 1) 


x maxi 0, 


i _ 3_v4L^rWl 
8(A—l) 1 / 3 

n(A — 1) 


> 


n mm 




—) 

2048/3/ 
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6.2 A faster parallel (RNC) algorithm 

Suppose we wish to use the parallel MT algorithm to draw from the sample space il' such that: 


□ 


VB G £>, Pfit(B) < max(0, 6(B) — p(B )) 

In the symmetric setting (with epd = a), and using the choice of p from Corollary 16.21 one can 
easily verify that the parallel MT algorithm, as described in [32] , will terminate after 
rounds whp. (The approach of [T9], based on two applications of LLL, will give the same result.) 
The running time of the parallel MT algorithm is dominated by selecting a maximal independent set 
(MIS) of true bad events (in this case, with the additional property that Y(B) = 1). As finding an 
MIS requires requires 0(log 2 m) parallel time (using Luby’s MIS algorithm[[30]), the total runtime 
of parallel MT would be 

We can improve this running time by only running the parallel MT algorithm for a constant 
number of rounds, using a slightly higher resampling probability than indicated in Theorem 16.11 
Unfortunately, we are not able to show a simple condition analogous to the asymmetric LLL for 
this algorithm to work. Unlike the Moser-Tardos algorithm, which “converges” to a good solution, 
we give an algorithm which “over-converges” to the desired solution. It reaches a good distribution 
faster than Moser-Tardos, but then it moves away from the good distribution. This algorithm seems 
to require a “uniformity” among the bad events, which is by definition true for the Symmetric LLL 
but seems harder to formalize in general. 

We may now define a parallel algorithm corresponding to the Truncated Moser-Tardos Algo¬ 
rithm. It differs from the usual parallel Moser-Tardos algorithm in two key ways. First, we maintain 
for each bad event B a resampling variable Y(B ) which is Bernoulli-g(.B), where q S [0,1] is a pa¬ 
rameter to be chosen, and we only resample bad events (including Y(B) itself) when Y(B) = 1. 
Second, instead of running the algorithm until there are no more true bad events, we run it for 
some fixed number t of iterations. We note that the choice of q(B ) is not an “equilibrium” value, 
as in Theorem 16.II this makes the parallel algorithm more challenging to analyze. 

Lemma 6.7. Suppose we are given a family of functions cq : B —>• [0, oo) for i = 1,..., t + 1 as 
well as probabilities q : B —>• [0,1], satisfying the recurrence for i = 1,..., t: 

<ri(B) > q(B)P n (B) 

cTi+i(B) > (Ji(B) + q(B)P n (B ) ^ a^B') - <t 4 _i (B') 

X<ZN(B) B'el B'el 

X independent 

Then, if the Parallel Truncated Moser-Tardos Algorithm is terminated after t iterations, then 
each B is true at that point with probability 

P(B true after t iterations ) < f+1 ^ ^ — at(B) 

Q( b ) 

Proof. We define ao(B) = 0 for each B e B. For each witness tree r whose nodes are labeled 
Bi,...,B s , define the weight w(t ) = nj=i Q( B i)Pn(B t ) 

Let Ti(B) denote the total weight of all witness trees of height i rooted in B, and let T<i(B) = 
'fZj<i Tj(B). We claim that Ti(B ) < di(B) — di-i(B) for i = 1 We shall show this by 

induction on i. Note that this automatically implies that T<i(B) < di(B) (the sum telescopes). 
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Suppose B is a tree of height i. Let Ai,A 2 denote the sets of neighbors of B whose subtrees have 
height i —1 and < i —2 respectively. We must have A\ 7 ^ 0 in order for B to have height i. For a fixed 
choice of Ai,A 2 , the total weight of all such trees is q(B)P^(B) T?-i(-E>i) fl b 2 &A> T<i~ 2 (B 2 ). 

Thus, summing over Ai,A 2 we have: 

Ti{B) < q(B)P n (B) e n Ti-i(Bi) ll T< i _ 2 {B 2 ) 

Ai,A2QN(B) B-l&Ai B2&A2 

A 1 ^,AiC\A2=^ 

A1 U A2 independent 

<q(B)P Q (B)J2 II (^-i(Si)-«t<- 2 (Si)) J] ^_ 2 (H 2 ) 

Ai,A 2CN(B) BieAi B 2 eA 2 

Ai^®,AinA2=® 

Ai U A2 independent 


In order to evaluate this sum, we first remove the restriction that Ai 7 ^ 0, and then we subtract 
off the terms with A\ = 0. In the former case, we would have 

E II (o'i-i(-Bi) - 0i-2(-Bi)) H cr i- 2 (B 2 ) 

Ai,A 2 QN(B) B^Ai B2&A2 

A 1 nA 2 =^ 

A± U A2 independent 

= E E n _ &i- 2 (Bi)) n Gi- 2 {B 2 ) 

IGN(B) AiC .1 B\eA\ B2EA2 

I independent A2=I—Ai 

E II (W(E - a^ 2 (B')) + 

ICN(B) B'el 
I independent 

= e n *-1 <»') 

ICN(B) B'el 
I independent 


On the other hand, the contribution from „4i = 0 is given by 

E II (^i-iC-Si) ~ <J i-2(5i)) H <Ji- 2 (B 2 ) 

-4i,^ 2CJV(B) -Bie.4i s 2 eyl2 

A 1 =®,AinA2 =0 

.Ai U .42 independent 

= e n 

ICN(B) B'eX 
I independent 

Putting these together, we have that 

^i(S) < q(B)Pn(B) E ( II - ( n ^-2(^0) 

XCN(B) B'ex B'eX 

X independent 

< &i(B) - CTj-i(^) (by hypothesis) 

Now consider the event that bad event B is true after t rounds of the parallel algorithm. We 
may construct a witness tree for this event; it has height < t + 1. If Y(B) = 1 after t rounds, 
then it must be the case that this tree has height exactly t + 1; for, either B or a neighbor would 
have been resampled at round t. Hence the probability that B remains true after t rounds can 


31 



be described by either a witness tree of height t + 1, rooted in B] or a witness tree of height < t, 
rooted in ( Y(B ) = 0) A B. Furthermore, for every event in the witness tree, other than the root 
node B, we require that Y(B') = 1 at the appropriate time. Thus, in total, we have 


as desired. 


P(B true after t. rounds) < 

< 


T t+1 (B) + T< t (B)(l-q(B)) 
q(B) 

<rt+i(B) - <r t (B) + a t (B)( 1 - q(B)) 


q(B) 


q(B) 

MB) 


□ 


And this specializes to the symmetric setting: 

Theorem 6.8. Suppose epd < a for a € (l,e]. Then let 17' he the distribution induced on the 
variables after running the Parallel Truncated Moser-Tardos Algorithm for t steps, where t is chosen 
appropriately as a function of p,d,a and t = 0((a — l) -1 ). In the space H', bad events have 
probability Pq'(B) < lj (f L . 

This can be implemented as a parallel (RNC) algorithm running in 0 (^_p) time. This can 
also be implemented as a distributed algorithm running in O( ^rrp ) rounds (if p,d, a are globally 
known parameters) 

Proof. We note that if d = 1, then all the events are completely independent. We can run t 
rounds of resampling, and each bad-event remains true with probability at most p t . Thus, we need 
t = 1 + 1+ 1 I 1 I 1 1 p 1Q < 0 ( ig$g ) < 0(-B- j-) rounds of resampling in order to ensure that p l < 
Henceforth we assume d > 2. 

We next discuss how to select the parameters t,q. Let us define 

I d-1 \d -i 
_ V d—In a ) 


7 g I — 1 

We claim that r > for — = Vrf-ina/ -. j s a decreasing function of a, and hence it 

can be lower-bounded by its value at a = e. Thus we have 

red > = = 

a ~ e \d— 1 / 

For all B G B, dehne q{B) = /3, for some parameter f3 to be chosen. Define cri(B) = 7 i(/3) where 
7 i(/3) is dehned recursively as follows: 

7o(/ 3) = 0 7*+i W) = @r(l + MP)) d 

We first claim that 7^+1 (/3) > 7j(/ 3) for all i > 0. We show this by induction on i. It is clear for 
i = 0. For i > 0, we have: 

7m(/3) = Ml + 7*(/?)) d 

> /3r( 1 + 7 i_i(/ 3 )) rf induction hypothesis 

= 7 i(P) 
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Next, we claim that this definition of q, a satisfies the conditions of Lemma 16.71 For, we have: 

<B) + q(B)Pn(B) Y n < B ') - n 
XCN(B) B'ex B'ei 

X independent 

= 7 m + Pn(B)0 Y 7*(/?) |X| - 7i-i(/?) |X| 

XCN(B) 

X independent 

< li{P) + Pn(B)p(( 1 + 7 mY - (1 + li-mY) as \N(B)\ < d and 7 i (/3) > 7 *-i(/?) 

< li(P) + rp{{ 1 + nm d - (1 + H-im d ) as P n (B) <p<X< r 

= 7i+i(/3) = a i+1 (B) 

Let 2 : = 1 ^ kl 1 a > 0. We claim that for t sufficiently large, there is some 0 € [0, 1] with 

71 (/ 3 ) = z. We will show this by continuity. Each 7 j(/3) is an increasing function of 0 with 

7i(0) = 0. Furthermore, we claim that we have for t > 1: 

7 t(l) > rA t_1 for A = r(d — 1)(1 + l/(d — l)) d (11) 

The reason for (TTTt is that for i > 0 we have = f ^ 1 + T 1 A 1 ^ . Now observe that for all 

v — 7i(l) 7U 1 ) 

i> 0 we have > (d — 1)(1 + l/(d — l)) d . 

Observe that A = ( d _^ na ) d ~ 1 > 1 . So, for t > |"max(0, ln(z/r ))/In A], we have 7 t(l) > 2 . Note 
that z/r = ( d d-\ a )‘^(l~l n a)- Simple calculus shows that this is 0(1) for d> 2 . Similarly, simple 
calculus shows that A > 1 + Q(a — 1). So, for t > tl(^-j-) we have that 74 ( 1 ) > 2 . This implies 
that there is some 0 € [ 0 , 1 ] and some choice of t < O(^-j-) with 7 1 (0) = z exactly. 

Now, Theorem 16.71 applies, and so the probability that any B is true after t rounds is at most 

- a,(B) = r(l + 7 t(^)) d - lt (H) = r(l + zf - z = ^ 

So far, we have shown by continuity that there is some choice of 0, for which the parallel MT 
algorithm would induce Pqi(B) < In the distributed setting, where computation is free, we 
can assume that each node is able to determine this value of 0 to any desired precision. To give a 
full parallel algorithm, we need to show that it is possible to determine such 0 efficiently. In fact, 
we only use 0 as a sampling probability; thus, the probability that we need to determine its z th bit 
decreases exponentially in i. So whp it suffices to compute 0(log(^j)) bits of it. 

Recall that 0 is the root of 7 1{0) — z in the range 0 € [0, 1]. We can determine this root via 
numerical bisection. It requires 0(log(-py)) rounds of bisection, and each such bisection can be 
performed in 0 ( 1 °^™ ) steps. 

□ 


7 Entropy of the MT-distribution 

One of the main themes of this paper has been that the MT-distribution has a high degree of 
randomness, comparable to the randomness of the original distribution 0. One more quantitative 
measure of this is the Renyi entropy of the MT-distribution. 
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Definition 7.1 f|12|). Let V be a distribution on a finite set S. We define the Renyi entropy with 
parameter p of V to be 

H p (V) = fi— ln]TfV(„)o 

1 - p tCs 

The entropy of any distribution is at most In |S|, which is achieved by the uniform distribution, 
and so H p measures how close a distribution is to uniform. The min-entropy is a special case 


-Hoo(V) = — lnmaxP(V)(u) = lim H P (V) 

veS p->oo 


See, e.g., [HUM SD] for the centrality of this notion. 

It is possible to use the LLL directly for combinatorial enumeration. Suppose that, when 
drawing from Ll, the bad-events are avoided with probability with at least p\ then it follows that 
the number of solutions is at least p\S\. This principle was used in [29], which counted certain types 
of permutations and matchings in this way. The entropy can also be used as a tool for enumerative 
combinatorics; namely, if IV is the distribution at the end of the MT algorithm, we know that 
the total number of solutions (i.e. combinatorial structures avoiding the bad-events) is at least 
exp(H p (Ll')) (for any choice of p ). 

The LLL gives bounds on the number of configurations which are essentially identical to those 
derived by analyzing the MT distribution. However, the MT distribution has a key advantage, 
which is that one may efficiently sample from the resulting distribution. The LLL distribution, by 
contrast, is a conditional distribution. In this sense, one may view the enumerate bounds produced 
from the MT distribution as being constructive, in a certain sense. Of course, for most applications 
of the LLL, the number of satisfying assignments is exponentially large, and so it is impossible to 
give a truly constructive enumerative algorithm for them. 

Our main result on the entropy of the MT-distribution is given by: 

Theorem 7.2. Let Ll' be the MT-distribution; then for p > 1 we have 

H p {p!) > H p (Q) - In J2 II 

P icb Bel 

I independent 


Proof. Consider some atomic event E defined by X\ = v\ A ■ ■ ■ A X n = v n . By Proposition 12.51 
the probability that E occurs at the end of MT is at most 0(E). Now observe that 0(E) < 
Pn(E) £ icb n Beih(B). 

I independent 

Letting x = £ icb Flee/ we th us h ave: 

I independent 


H P (V) = 


> 


> 


1 


1-p 

1 

1-p 

P 

1-p 


In ^ p w (v) p 

V 

lny 2 xP n(v)) p 

V 

In x + —— V] Pn (v) p 

1 — O Z -' 


□ 

We can think of the term Y2 icb TlBei p(^) as a distortion factor between Ll and O'. 

I independent 

The following is a crude but simple estimate of this factor: 
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Proposition 7.3. We have 


in y n m-®) ~ y m®) 

icb Bel BeB 

I independent 

Proof. We have 

y n m®) - n ^+m®)) ^ ex p( y m b )) 

icb Bel BeB BeB 

I independent 

and the claim follows. □ 

In most applications of the LLL, we keep track of independent sets of bad-events in terms of 
their variables: namely, for each variable i, an independent set I can contain at most one bad- 
event involving i. The following result shows how this variable-based accounting can yield a better 
estimate for the entropy: 

Theorem 7.4. For any bad-event B, define 

y{B ) = (1 + fJ>(B)) - 1 

Then we have 

E n^)<n(!+ E y( B >) 

ICB Bel ie[n] BeB 

I independent B involves variable i 

Proof. We can expand the RHS as a polynomial Q in the values y(B ) where B ranges over B. 
Given an independent set / C B, we say that a monomial in the terms y is supported on I if, for 
each B , the exponent of y{B) is positive iff B € I. 

For any set I, define q(I ) to be the sum of all monomials of Q supported on I. Thus, for 
example if I = {B} then q(I) is the sum over all terms in RHS of the form y(B) 3 , for j > 1. 

Now, observe that if J, J' are distinct subsets of £>, then the monomials supported on J, J' are 

disjoint. Furthermore, q{J) > 0 for all JCB. Thus 

n( 1+ ZI v{B)) = Y^ J ) 

ie[n\ BeB JCB 

B involves variable i 

We now claim that for any independent set I CB, we have 

II Ms) = (12) 

Bel 

This equation (1121) implies that 

x] n ms)< y 

ICB Bel ICB 

I independent I independent 

< Y^ q(J) as q(J) — 0 f° r all J C B 

JCB 

= II I 1 + E y( B )) 

ie[n] BeB 

B involves variable i 
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which is what we are trying to show. So we now move on to prove (1121) . 

For any set J C B (not necessarily independent), we may produce a monomial supported on 
J by selecting, for each i = some set of variables Ri C var(.Bj), Ri ^ 0, and furthermore 

Ri, ..., Rk are all disjoint. Thus, for any J = {B i,..., B k } C B we have 

q(J)= J2 y(B 1 )^...y(B k )^ 

Ri,...,Rk 
Rl,...,Rk disjoint 
RiCva.r(Bi) 

Ri^<6 

Observe that if / is independent, then any such i?i,... ,R k are automatically disjoint. Thus, 
for independent / = {B i,..., B k } C B, we have 

q (i)= y y(Bi)^... y (B k )^=n y ^) |Ri =n(( i ^(^)) |var(Bi)l - i )=nMA) 

Ri,...,Rk i=l i?Cvar(i?i) i=l i=l 

RiCvar(Bi) _R ^0 

So, we have shown that for independent ICBwe have ^(I) = rise/ h(B). 

□ 


We give an example for independent transversals. Given a graph G with its vertices partitioned 
into blocks V = Vi U V 2 U • • • U14, an independent transversal (also known as an independent system 
of representatives) of G is a set / such that \I 0 V)| = 1 for each i = 1 ,..., k, and such that / is 
an independent set of G. This. This structure has received significant attention, starting in |7]. 
Currently, the best algorithms for producing independent transversals come from the LLL and the 
MT algorithm; see [ 6 ] and [3lj . 


Proposition 7.5. Suppose we have a graph G of maximum degree A, with its vertex set partitioned 
into k blocks containing b vertices, such that b > 4A. Suppose we run the MT algorithm to find 
an independent transversal, using the natural probability distribution (selecting one vertex indepen¬ 
dently from each block). Then the MT algorithm terminates and the resulting probability space has 
min-entropy at least 


H^SY) > k In 


4 b 

2 + 6 /A- v / 6 2 /A 2 — Ab/K 


Proof. The min-entropy of 14 is — In b~ k = k In b. 

The probability distribution 14 selects a node from each block uniformly at random. For each 
edge / = (u,v) £ G we have a bad-event that u, v are both selected for the independent transversal. 
It is any easy exercise to see that the asymmetric LLL criterion is satisfied by setting /r(-B) = a = 
f° r B £ B. Thus, we have y(B) = (1 + a ) 1 / 2 — 1 . 

In this setting, a variable corresponds to a block. There are at most 26A bad-events involving 
each block and so we have 


n ( 1 + Y 2/( S ))< n (1 + 26A((1 + a ) 1,/2 — 1 )) 

variables i B involves variable i blocks i 


= 1 + 2bA 


(\ 


1 + 


b- ^6(6-4A)V 

46 2 A 2 



k 
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Now, suppose that 6/A = x, where x > 4 is a fixed value; then simple calculus shows that 

/ (b-yjb{b~ 4AyV 

the expression 1 + 26A(V 1 + --—--1) is an increasing function of A which approaches 

increasingly to 1/2 (x — ^/x(x — 4)). Thus, we have that 


(b — y/b{b — 4 A)') _ _ 

1 + 26A(^| 1 + i- >- - 1) < i((6/A) - Vfi/AKVA * 4)). 

By Theorem 17.41 this implies that 

, „ /2 + 6/A - \/b 2 / A 2 - 46/An 

i7 00 (fl / ) > klnb — fcln^-—-J 


= k In ■ 


46 


2 + 6/A - y/b 2 / A 2 - 46/A 


□ 


We see that the distortion of IT is relatively mild. When 6 = 4A, then the min-entropy is 
< k(lnb— ln3/2). When 6 A, the min-entropy is (up to first order) fc(ln6— ^ — 0((A/6) 5 / 2 ). 

By comparison, the cruder Proposition l7. 31 would give estimates in these two regimes of, respectively, 


A:(In6 — 1/2) and A:(In6 — ^ ^ — 0((A/6) 3 ). 

Finally, we give an example for partially satisfying £;-SAT. This is, to our knowledge, the first 
result to show that not only is the A:-SAT problem partially satisfiable, but that it has many partial 
solutions (indeed, exponentially many solutions). 


Proposition 7.6. Suppose we have a k-SAT instance with m clauses and n variables, in which 

each variable participates in up to L < ^- 2/k clauses (either positively or negatively), for 

a € [l,e]. Then there are at least 

2 n 

exp(ffi±^M)p 0 I|/(m) 

assignments which satisfy at least m( 1 — 2 -fc eln(a)/a) — 1 clauses, where we define 


ft = 1 — In a 


Proof. We run the MT algorithm as in Theorem 16.31 and compute H p of the resulting distribution. 
Using the notation of Theorem 16.31 we have A w. Observe that, by double-counting 

m < nL/k and so we have 

Next, we compute H p of the original distribution. Each variable is Bernoulli with mean 1/2 + 
x(l/2 — <5) < 1/2 + so we have 

"<■<«>* rb ta (< 1 ' i, + s ) ' + < i/ 2 -JX) 

Hence by Theorem 17.21 we have 


a rzy ln ( (1/2 + 2k r + <1/2 “ ik r ) 


p 2 n/3 
p — 1 k 2 
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We set p 
to obtain: 


1 + 2/3 1 / 2 and use the identity ln((l/2+w) p + (l/2 — w) p ) < (1 — p) ln2 — 2(1 — p)pw 2 


H p (ii')>n(ln2- <?(4 + 4 ) /f + f?) 


In the resulting probability distribution, the expected number of failed constraints is m2 k e ln(a)/ a. 
Hence, by Markov’s inequality we fail at most m2~ k eln(a )/a + 1 constraints with probability at 
least poly(l/m). Thus, the entropy of O' conditioned on this event is at least ra(ln2 — _ 

O(logm). The result follows. 

□ 
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